doc/publican/sources/Architecture.xml - third_party/wayland - Git at Google

 <?xml version='1.0' encoding='utf-8' ?>
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
 <!ENTITY % BOOK_ENTITIES SYSTEM "Wayland.ent">
 %BOOK_ENTITIES;
 ]>
 <chapter id="chap-Wayland-Architecture">
   <title>Wayland Architecture</title>
   <section id="sect-Wayland-Architecture-wayland_architecture">
     <title>X vs. Wayland Architecture</title>
     <para>
       A good way to understand the Wayland architecture
       and how it is different from X is to follow an event
       from the input device to the point where the change
       it affects appears on screen.
     </para>
     <para>
       This is where we are now with X:
     </para>
     <figure>
       <title>X architecture diagram</title>
       <mediaobjectco>
 	<imageobjectco>
 	  <areaspec id="map1" units="other" otherunits="imagemap">
 	    <area id="area1_1" linkends="x_flow_1" x_steal="#step_1"/>
 	    <area id="area1_2" linkends="x_flow_2" x_steal="#step_2"/>
 	    <area id="area1_3" linkends="x_flow_3" x_steal="#step_3"/>
 	    <area id="area1_4" linkends="x_flow_4" x_steal="#step_4"/>
 	    <area id="area1_5" linkends="x_flow_5" x_steal="#step_5"/>
 	    <area id="area1_6" linkends="x_flow_6" x_steal="#step_6"/>
 	  </areaspec>
 	  <imageobject>
 	    <imagedata fileref="images/x-architecture.png" format="PNG" />
 	  </imageobject>
 	</imageobjectco>
       </mediaobjectco>
     </figure>
     <para>
       <orderedlist>
 	<listitem id="x_flow_1">
 	  <para>
 	    The kernel gets an event from an input
 	    device and sends it to X through the evdev
 	    input driver. The kernel does all the hard
 	    work here by driving the device and
 	    translating the different device specific
 	    event protocols to the linux evdev input
 	    event standard.
 	  </para>
 	</listitem>
 	<listitem id="x_flow_2">
 	  <para>
 	    The X server determines which window the
 	    event affects and sends it to the clients
 	    that have selected for the event in question
 	    on that window. The X server doesn't
 	    actually know how to do this right, since
 	    the window location on screen is controlled
 	    by the compositor and may be transformed in
 	    a number of ways that the X server doesn't
 	    understand (scaled down, rotated, wobbling,
 	    etc).
 	  </para>
 	</listitem>
 	<listitem id="x_flow_3">
 	  <para>
 	    The client looks at the event and decides
 	    what to do. Often the UI will have to change
 	    in response to the event - perhaps a check
 	    box was clicked or the pointer entered a
 	    button that must be highlighted. Thus the
 	    client sends a rendering request back to the
 	    X server.
 	  </para>
 	</listitem>
 	<listitem id="x_flow_4">
 	  <para>
 	    When the X server receives the rendering
 	    request, it sends it to the driver to let it
 	    program the hardware to do the rendering.
 	    The X server also calculates the bounding
 	    region of the rendering, and sends that to
 	    the compositor as a damage event.
 	  </para>
 	</listitem>
 	<listitem id="x_flow_5">
 	  <para>
 	    The damage event tells the compositor that
 	    something changed in the window and that it
 	    has to recomposite the part of the screen
 	    where that window is visible. The compositor
 	    is responsible for rendering the entire
 	    screen contents based on its scenegraph and
 	    the contents of the X windows. Yet, it has
 	    to go through the X server to render this.
 	  </para>
 	</listitem>
 	<listitem id="x_flow_6">
 	  <para>
 	    The X server receives the rendering requests
 	    from the compositor and either copies the
 	    compositor back buffer to the front buffer
 	    or does a pageflip. In the general case, the
 	    X server has to do this step so it can
 	    account for overlapping windows, which may
 	    require clipping and determine whether or
 	    not it can page flip. However, for a
 	    compositor, which is always fullscreen, this
 	    is another unnecessary context switch.
 	  </para>
 	</listitem>
       </orderedlist>
     </para>
     <para>
       As suggested above, there are a few problems with this
       approach. The X server doesn't have the information to
       decide which window should receive the event, nor can it
       transform the screen coordinates to window-local
       coordinates. And even though X has handed responsibility for
       the final painting of the screen to the compositing manager,
       X still controls the front buffer and modesetting. Most of
       the complexity that the X server used to handle is now
       available in the kernel or self contained libraries (KMS,
       evdev, mesa, fontconfig, freetype, cairo, Qt etc). In
       general, the X server is now just a middle man that
       introduces an extra step between applications and the
       compositor and an extra step between the compositor and the
       hardware.
     </para>
     <para>
       In Wayland the compositor is the display server. We transfer
       the control of KMS and evdev to the compositor. The Wayland
       protocol lets the compositor send the input events directly
       to the clients and lets the client send the damage event
       directly to the compositor:
     </para>
     <figure>
       <title>Wayland architecture diagram</title>
       <mediaobjectco>
 	<imageobjectco>
 	  <areaspec id="mapB" units="other" otherunits="imagemap">
 	    <area id="areaB_1" linkends="wayland_flow_1" x_steal="#step_1"/>
 	    <area id="areaB_2" linkends="wayland_flow_2" x_steal="#step_2"/>
 	    <area id="areaB_3" linkends="wayland_flow_3" x_steal="#step_3"/>
 	    <area id="areaB_4" linkends="wayland_flow_4" x_steal="#step_4"/>
 	  </areaspec>
 	  <imageobject>
 	    <imagedata fileref="images/wayland-architecture.png" format="PNG" />
 	  </imageobject>
 	</imageobjectco>
       </mediaobjectco>
     </figure>
     <para>
       <orderedlist>
 	<listitem id="wayland_flow_1">
 	  <para>
 	    The kernel gets an event and sends
 	    it to the compositor. This
 	    is similar to the X case, which is
 	    great, since we get to reuse all the
 	    input drivers in the kernel.
 	  </para>
 	</listitem>
 	<listitem id="wayland_flow_2">
 	  <para>
 	    The compositor looks through its
 	    scenegraph to determine which window
 	    should receive the event. The
 	    scenegraph corresponds to what's on
 	    screen and the compositor
 	    understands the transformations that
 	    it may have applied to the elements
 	    in the scenegraph. Thus, the
 	    compositor can pick the right window
 	    and transform the screen coordinates
 	    to window-local coordinates, by
 	    applying the inverse
 	    transformations. The types of
 	    transformation that can be applied
 	    to a window is only restricted to
 	    what the compositor can do, as long
 	    as it can compute the inverse
 	    transformation for the input events.
 	  </para>
 	</listitem>
 	<listitem id="wayland_flow_3">
 	  <para>
 	    As in the X case, when the client
 	    receives the event, it updates the
 	    UI in response. But in the Wayland
 	    case, the rendering happens in the
 	    client, and the client just sends a
 	    request to the compositor to
 	    indicate the region that was
 	    updated.
 	  </para>
 	</listitem>
 	<listitem id="wayland_flow_4">
 	  <para>
 	    The compositor collects damage
 	    requests from its clients and then
 	    recomposites the screen. The
 	    compositor can then directly issue
 	    an ioctl to schedule a pageflip with
 	    KMS.
 	  </para>
 	</listitem>


       </orderedlist>
     </para>
   </section>
   <section id="sect-Wayland-Architecture-wayland_rendering">
     <title>Wayland Rendering</title>
     <para>
       One of the details I left out in the above overview
       is how clients actually render under Wayland. By
       removing the X server from the picture we also
       removed the mechanism by which X clients typically
       render. But there's another mechanism that we're
       already using with DRI2 under X: direct rendering.
       With direct rendering, the client and the server
       share a video memory buffer. The client links to a
       rendering library such as OpenGL that knows how to
       program the hardware and renders directly into the
       buffer. The compositor in turn can take the buffer
       and use it as a texture when it composites the
       desktop. After the initial setup, the client only
       needs to tell the compositor which buffer to use and
       when and where it has rendered new content into it.
     </para>

     <para>
       This leaves an application with two ways to update its window contents:
     </para>
     <para>
       <orderedlist>
 	<listitem>
 	  <para>
 	    Render the new content into a new buffer and tell the compositor
 	    to use that instead of the old buffer. The application can
 	    allocate a new buffer every time it needs to update the window
 	    contents or it can keep two (or more) buffers around and cycle
 	    between them. The buffer management is entirely under
 	    application control.
 	  </para>
 	</listitem>
 	<listitem>
 	  <para>
 	    Render the new content into the buffer that it previously
 	    told the compositor to to use. While it's possible to just
 	    render directly into the buffer shared with the compositor,
 	    this might race with the compositor. What can happen is that
 	    repainting the window contents could be interrupted by the
 	    compositor repainting the desktop. If the application gets
 	    interrupted just after clearing the window but before
 	    rendering the contents, the compositor will texture from a
 	    blank buffer. The result is that the application window will
 	    flicker between a blank window or half-rendered content. The
 	    traditional way to avoid this is to render the new content
 	    into a back buffer and then copy from there into the
 	    compositor surface. The back buffer can be allocated on the
 	    fly and just big enough to hold the new content, or the
 	    application can keep a buffer around. Again, this is under
 	    application control.
 	  </para>
 	</listitem>
       </orderedlist>
     </para>
     <para>
       In either case, the application must tell the compositor
       which area of the surface holds new contents. When the
       application renders directly to the shared buffer, the
       compositor needs to be noticed that there is new content.
       But also when exchanging buffers, the compositor doesn't
       assume anything changed, and needs a request from the
       application before it will repaint the desktop. The idea
       that even if an application passes a new buffer to the
       compositor, only a small part of the buffer may be
       different, like a blinking cursor or a spinner.
     </para>
   </section>
   <section id="sect-Wayland-Architecture-wayland_hw_enabling">
     <title>Hardware Enabling for Wayland</title>
     <para>
       Typically, hardware enabling includes modesetting/display
       and EGL/GLES2. On top of that Wayland needs a way to share
       buffers efficiently between processes. There are two sides
       to that, the client side and the server side.
     </para>
     <para>
       On the client side we've defined a Wayland EGL platform. In
       the EGL model, that consists of the native types
       (EGLNativeDisplayType, EGLNativeWindowType and
       EGLNativePixmapType) and a way to create those types. In
       other words, it's the glue code that binds the EGL stack and
       its buffer sharing mechanism to the generic Wayland API. The
       EGL stack is expected to provide an implementation of the
       Wayland EGL platform. The full API is in the wayland-egl.h
       header. The open source implementation in the mesa EGL stack
       is in wayland-egl.c and platform_wayland.c.
     </para>
     <para>
       Under the hood, the EGL stack is expected to define a
       vendor-specific protocol extension that lets the client side
       EGL stack communicate buffer details with the compositor in
       order to share buffers. The point of the wayland-egl.h API
       is to abstract that away and just let the client create an
       EGLSurface for a Wayland surface and start rendering. The
       open source stack uses the drm Wayland extension, which lets
       the client discover the drm device to use and authenticate
       and then share drm (GEM) buffers with the compositor.
     </para>
     <para>
       The server side of Wayland is the compositor and core UX for
       the vertical, typically integrating task switcher, app
       launcher, lock screen in one monolithic application. The
       server runs on top of a modesetting API (kernel modesetting,
       OpenWF Display or similar) and composites the final UI using
       a mix of EGL/GLES2 compositor and hardware overlays if
       available. Enabling modesetting, EGL/GLES2 and overlays is
       something that should be part of standard hardware bringup.
       The extra requirement for Wayland enabling is the
       EGL_WL_bind_wayland_display extension that lets the
       compositor create an EGLImage from a generic Wayland shared
       buffer. It's similar to the EGL_KHR_image_pixmap extension
       to create an EGLImage from an X pixmap.
     </para>
     <para>
       The extension has a setup step where you have to bind the
       EGL display to a Wayland display. Then as the compositor
       receives generic Wayland buffers from the clients (typically
       when the client calls eglSwapBuffers), it will be able to
       pass the struct wl_buffer pointer to eglCreateImageKHR as
       the EGLClientBuffer argument and with EGL_WAYLAND_BUFFER_WL
       as the target. This will create an EGLImage, which can then
       be used by the compositor as a texture or passed to the
       modesetting code to use as an overlay plane. Again, this is
       implemented by the vendor specific protocol extension, which
       on the server side will receive the driver specific details
       about the shared buffer and turn that into an EGL image when
       the user calls eglCreateImageKHR.
     </para>
   </section>
 </chapter>
	<?xml version='1.0' encoding='utf-8' ?>
	<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
	<!ENTITY % BOOK_ENTITIES SYSTEM "Wayland.ent">
	%BOOK_ENTITIES;
	]>
	<chapter id="chap-Wayland-Architecture">
	<title>Wayland Architecture</title>
	<section id="sect-Wayland-Architecture-wayland_architecture">
	<title>X vs. Wayland Architecture</title>
	<para>
	A good way to understand the Wayland architecture
	and how it is different from X is to follow an event
	from the input device to the point where the change
	it affects appears on screen.
	</para>
	<para>
	This is where we are now with X:
	</para>
	<figure>
	<title>X architecture diagram</title>
	<mediaobjectco>
	<imageobjectco>
	<areaspec id="map1" units="other" otherunits="imagemap">
	<area id="area1_1" linkends="x_flow_1" x_steal="#step_1"/>
	<area id="area1_2" linkends="x_flow_2" x_steal="#step_2"/>
	<area id="area1_3" linkends="x_flow_3" x_steal="#step_3"/>
	<area id="area1_4" linkends="x_flow_4" x_steal="#step_4"/>
	<area id="area1_5" linkends="x_flow_5" x_steal="#step_5"/>
	<area id="area1_6" linkends="x_flow_6" x_steal="#step_6"/>
	</areaspec>
	<imageobject>
	<imagedata fileref="images/x-architecture.png" format="PNG" />
	</imageobject>
	</imageobjectco>
	</mediaobjectco>
	</figure>
	<para>
	<orderedlist>
	<listitem id="x_flow_1">
	<para>
	The kernel gets an event from an input
	device and sends it to X through the evdev
	input driver. The kernel does all the hard
	work here by driving the device and
	translating the different device specific
	event protocols to the linux evdev input
	event standard.
	</para>
	</listitem>
	<listitem id="x_flow_2">
	<para>
	The X server determines which window the
	event affects and sends it to the clients
	that have selected for the event in question
	on that window. The X server doesn't
	actually know how to do this right, since
	the window location on screen is controlled
	by the compositor and may be transformed in
	a number of ways that the X server doesn't
	understand (scaled down, rotated, wobbling,
	etc).
	</para>
	</listitem>
	<listitem id="x_flow_3">
	<para>
	The client looks at the event and decides
	what to do. Often the UI will have to change
	in response to the event - perhaps a check
	box was clicked or the pointer entered a
	button that must be highlighted. Thus the
	client sends a rendering request back to the
	X server.
	</para>
	</listitem>
	<listitem id="x_flow_4">
	<para>
	When the X server receives the rendering
	request, it sends it to the driver to let it
	program the hardware to do the rendering.
	The X server also calculates the bounding
	region of the rendering, and sends that to
	the compositor as a damage event.
	</para>
	</listitem>
	<listitem id="x_flow_5">
	<para>
	The damage event tells the compositor that
	something changed in the window and that it
	has to recomposite the part of the screen
	where that window is visible. The compositor
	is responsible for rendering the entire
	screen contents based on its scenegraph and
	the contents of the X windows. Yet, it has
	to go through the X server to render this.
	</para>
	</listitem>
	<listitem id="x_flow_6">
	<para>
	The X server receives the rendering requests
	from the compositor and either copies the
	compositor back buffer to the front buffer
	or does a pageflip. In the general case, the
	X server has to do this step so it can
	account for overlapping windows, which may
	require clipping and determine whether or
	not it can page flip. However, for a
	compositor, which is always fullscreen, this
	is another unnecessary context switch.
	</para>
	</listitem>
	</orderedlist>
	</para>
	<para>
	As suggested above, there are a few problems with this
	approach. The X server doesn't have the information to
	decide which window should receive the event, nor can it
	transform the screen coordinates to window-local
	coordinates. And even though X has handed responsibility for
	the final painting of the screen to the compositing manager,
	X still controls the front buffer and modesetting. Most of
	the complexity that the X server used to handle is now
	available in the kernel or self contained libraries (KMS,
	evdev, mesa, fontconfig, freetype, cairo, Qt etc). In
	general, the X server is now just a middle man that
	introduces an extra step between applications and the
	compositor and an extra step between the compositor and the
	hardware.
	</para>
	<para>
	In Wayland the compositor is the display server. We transfer
	the control of KMS and evdev to the compositor. The Wayland
	protocol lets the compositor send the input events directly
	to the clients and lets the client send the damage event
	directly to the compositor:
	</para>
	<figure>
	<title>Wayland architecture diagram</title>
	<mediaobjectco>
	<imageobjectco>
	<areaspec id="mapB" units="other" otherunits="imagemap">
	<area id="areaB_1" linkends="wayland_flow_1" x_steal="#step_1"/>
	<area id="areaB_2" linkends="wayland_flow_2" x_steal="#step_2"/>
	<area id="areaB_3" linkends="wayland_flow_3" x_steal="#step_3"/>
	<area id="areaB_4" linkends="wayland_flow_4" x_steal="#step_4"/>
	</areaspec>
	<imageobject>
	<imagedata fileref="images/wayland-architecture.png" format="PNG" />
	</imageobject>
	</imageobjectco>
	</mediaobjectco>
	</figure>
	<para>
	<orderedlist>
	<listitem id="wayland_flow_1">
	<para>
	The kernel gets an event and sends
	it to the compositor. This
	is similar to the X case, which is
	great, since we get to reuse all the
	input drivers in the kernel.
	</para>
	</listitem>
	<listitem id="wayland_flow_2">
	<para>
	The compositor looks through its
	scenegraph to determine which window
	should receive the event. The
	scenegraph corresponds to what's on
	screen and the compositor
	understands the transformations that
	it may have applied to the elements
	in the scenegraph. Thus, the
	compositor can pick the right window
	and transform the screen coordinates
	to window-local coordinates, by
	applying the inverse
	transformations. The types of
	transformation that can be applied
	to a window is only restricted to
	what the compositor can do, as long
	as it can compute the inverse
	transformation for the input events.
	</para>
	</listitem>
	<listitem id="wayland_flow_3">
	<para>
	As in the X case, when the client
	receives the event, it updates the
	UI in response. But in the Wayland
	case, the rendering happens in the
	client, and the client just sends a
	request to the compositor to
	indicate the region that was
	updated.
	</para>
	</listitem>
	<listitem id="wayland_flow_4">
	<para>
	The compositor collects damage
	requests from its clients and then
	recomposites the screen. The
	compositor can then directly issue
	an ioctl to schedule a pageflip with
	KMS.
	</para>
	</listitem>


	</orderedlist>
	</para>
	</section>
	<section id="sect-Wayland-Architecture-wayland_rendering">
	<title>Wayland Rendering</title>
	<para>
	One of the details I left out in the above overview
	is how clients actually render under Wayland. By
	removing the X server from the picture we also
	removed the mechanism by which X clients typically
	render. But there's another mechanism that we're
	already using with DRI2 under X: direct rendering.
	With direct rendering, the client and the server
	share a video memory buffer. The client links to a
	rendering library such as OpenGL that knows how to
	program the hardware and renders directly into the
	buffer. The compositor in turn can take the buffer
	and use it as a texture when it composites the
	desktop. After the initial setup, the client only
	needs to tell the compositor which buffer to use and
	when and where it has rendered new content into it.
	</para>

	<para>
	This leaves an application with two ways to update its window contents:
	</para>
	<para>
	<orderedlist>
	<listitem>
	<para>
	Render the new content into a new buffer and tell the compositor
	to use that instead of the old buffer. The application can
	allocate a new buffer every time it needs to update the window
	contents or it can keep two (or more) buffers around and cycle
	between them. The buffer management is entirely under
	application control.
	</para>
	</listitem>
	<listitem>
	<para>
	Render the new content into the buffer that it previously
	told the compositor to to use. While it's possible to just
	render directly into the buffer shared with the compositor,
	this might race with the compositor. What can happen is that
	repainting the window contents could be interrupted by the
	compositor repainting the desktop. If the application gets
	interrupted just after clearing the window but before
	rendering the contents, the compositor will texture from a
	blank buffer. The result is that the application window will
	flicker between a blank window or half-rendered content. The
	traditional way to avoid this is to render the new content
	into a back buffer and then copy from there into the
	compositor surface. The back buffer can be allocated on the
	fly and just big enough to hold the new content, or the
	application can keep a buffer around. Again, this is under
	application control.
	</para>
	</listitem>
	</orderedlist>
	</para>
	<para>
	In either case, the application must tell the compositor
	which area of the surface holds new contents. When the
	application renders directly to the shared buffer, the
	compositor needs to be noticed that there is new content.
	But also when exchanging buffers, the compositor doesn't
	assume anything changed, and needs a request from the
	application before it will repaint the desktop. The idea
	that even if an application passes a new buffer to the
	compositor, only a small part of the buffer may be
	different, like a blinking cursor or a spinner.
	</para>
	</section>
	<section id="sect-Wayland-Architecture-wayland_hw_enabling">
	<title>Hardware Enabling for Wayland</title>
	<para>
	Typically, hardware enabling includes modesetting/display
	and EGL/GLES2. On top of that Wayland needs a way to share
	buffers efficiently between processes. There are two sides
	to that, the client side and the server side.
	</para>
	<para>
	On the client side we've defined a Wayland EGL platform. In
	the EGL model, that consists of the native types
	(EGLNativeDisplayType, EGLNativeWindowType and
	EGLNativePixmapType) and a way to create those types. In
	other words, it's the glue code that binds the EGL stack and
	its buffer sharing mechanism to the generic Wayland API. The
	EGL stack is expected to provide an implementation of the
	Wayland EGL platform. The full API is in the wayland-egl.h
	header. The open source implementation in the mesa EGL stack
	is in wayland-egl.c and platform_wayland.c.
	</para>
	<para>
	Under the hood, the EGL stack is expected to define a
	vendor-specific protocol extension that lets the client side
	EGL stack communicate buffer details with the compositor in
	order to share buffers. The point of the wayland-egl.h API
	is to abstract that away and just let the client create an
	EGLSurface for a Wayland surface and start rendering. The
	open source stack uses the drm Wayland extension, which lets
	the client discover the drm device to use and authenticate
	and then share drm (GEM) buffers with the compositor.
	</para>
	<para>
	The server side of Wayland is the compositor and core UX for
	the vertical, typically integrating task switcher, app
	launcher, lock screen in one monolithic application. The
	server runs on top of a modesetting API (kernel modesetting,
	OpenWF Display or similar) and composites the final UI using
	a mix of EGL/GLES2 compositor and hardware overlays if
	available. Enabling modesetting, EGL/GLES2 and overlays is
	something that should be part of standard hardware bringup.
	The extra requirement for Wayland enabling is the
	EGL_WL_bind_wayland_display extension that lets the
	compositor create an EGLImage from a generic Wayland shared
	buffer. It's similar to the EGL_KHR_image_pixmap extension
	to create an EGLImage from an X pixmap.
	</para>
	<para>
	The extension has a setup step where you have to bind the
	EGL display to a Wayland display. Then as the compositor
	receives generic Wayland buffers from the clients (typically
	when the client calls eglSwapBuffers), it will be able to
	pass the struct wl_buffer pointer to eglCreateImageKHR as
	the EGLClientBuffer argument and with EGL_WAYLAND_BUFFER_WL
	as the target. This will create an EGLImage, which can then
	be used by the compositor as a texture or passed to the
	modesetting code to use as an overlay plane. Again, this is
	implemented by the vendor specific protocol extension, which
	on the server side will receive the driver specific details
	about the shared buffer and turn that into an EGL image when
	the user calls eglCreateImageKHR.
	</para>
	</section>
	</chapter>