HTML and WebGL Integration

Making HTML and 3D Work Together

Three.js renders to a canvas element. Everything else in a web application is HTML. Buttons, forms, tooltips, panels, data tables, navigation menus. Building a Three.js HTML overlay that makes these two worlds work together is one of the most persistent problems in 3D web development.

The question appears repeatedly across the Three.js forum, Stack Overflow, and Reddit. There's no single dominant answer because the right approach depends on what you are building: labels that track 3D objects, floating panels with detailed information, toolbar controls for the 3D scene, or full application UI that coexists with a 3D viewport.

Annotated Satellite Explorer

Seven annotation hotspots sit on different faces of a 3D satellite. Each label fades in when its surface faces the camera and fades out when it turns away, so you never see all seven at once. The visibility check is a dot product between each annotation's surface normal and the camera direction: positive means facing, negative means hidden. Drag to rotate, scroll to zoom.

Each frame: Vector3.project() for screen position, dot(surfaceNormal, toCamera) for facing opacity. Overlay: pointer-events:none parent, pointer-events:auto cards.

The Constraint: Two Rendering Pipelines

The WebGL canvas and the HTML DOM are different rendering pipelines. The canvas paints pixels directly to a bitmap. The DOM is a tree of styled elements managed by the browser's layout engine. They do not naturally interact.

Layering HTML on top of a canvas is trivial: set a CSS z-index. The hard parts are not.

Position Synchronisation

Making an HTML element follow a 3D object as the camera moves. The element's CSS position must be recalculated every frame by projecting the 3D world position to 2D screen coordinates.

Pointer Event Conflicts

When HTML elements overlap the canvas, clicks hit either the HTML element or pass through to the 3D scene. Getting this right for overlapping tooltips, clickable labels, and transparent overlay regions requires careful event management.

Depth Ordering

In 3D, an object behind another should be occluded. HTML elements layered on a canvas do not participate in the 3D depth buffer. A label for a hidden object still appears on top unless you manually hide it.

Performance at Scale

Updating 500 HTML element positions every frame (16ms budget) adds noticeable DOM overhead. A CSS translate is cheap, but element count is what bites you.

These four constraints shape every integration decision. The approach that solves one often makes another worse.


The Naive Approach

The common first attempt works with 5 to 10 labels on a simple scene. It breaks with 50+ labels, complex layering, or any scene where occlusion matters.

style.left/top per frame. Positioning HTML elements in pixels calculated from Vector3.project() every frame. Triggers layout recalculation for each element.
Create/remove on hover. Show tooltips by creating a new DOM element on hover and removing it on mouse leave. Element creation and removal every interaction causes layout thrashing.

The events side is where it really comes apart. A single listener cannot reliably tell which layer the user meant to hit, and depth is ignored entirely.

Single event listener. One click handler that tries to determine whether the user clicked the HTML overlay or the 3D scene underneath. Edge cases multiply with every new interactive element.
Ignore depth. Labels for objects behind other objects remain visible, overlapping and creating visual clutter that makes the interface unreadable.

CSS2DRenderer: Labels and Annotations

CSS2DRenderer is the default starting point for a Three.js HTML overlay. It positions regular HTML elements (styled with CSS, holding any content) to match 3D object positions. CSS2DObject wraps an element and stores a 3D position. Each frame, CSS2DRenderer projects that position to screen coordinates and applies a CSS translate.

Searchable, selectable labels. CSS2D elements are real DOM nodes. Users can select text, screen readers can read them, browsers can search them.
HTML tooltip content. Links, formatted text, images, embedded charts. Any HTML inside a tracking label.

Two of these matter more than people expect: collision handling and accessibility. Both fall out of the elements being real DOM nodes rather than pixels.

Screen-space collision avoidance. Because elements are in the DOM, you can detect overlaps and adjust positions to prevent label pile-up.
Accessibility. Screen readers can navigate CSS2D elements. ARIA attributes work. Focus management is possible.

All of that comes at a per-frame cost, and the cost scales with element count. This is the wall most people hit second, after they have got positioning working.

Performance note: CSS2DRenderer updates every element every frame. At 200+ elements, the DOM update cost climbs sharply. Hide off-screen elements (check projected position against viewport bounds), use intersection observer patterns, or limit visible labels to the nearest N objects.


Drei Html: The React Three Fiber Route

If you build with React Three Fiber, you will not call CSS2DRenderer directly. Drei's Html component wraps the same idea in a declarative form, and it is the route most R3F developers reach for first. You place it in the scene graph like any other element, and it tracks its 3D position for you.

What Drei Html gives you for free

The occlude prop hides the element when geometry sits in front of it, which is the depth behaviour CSS2DRenderer cannot do on its own. distanceFactor scales the element with camera distance so labels shrink as they recede. A zIndexRange prop controls stacking against the canvas.

The trade-off is the usual React one: it ties this part of your UI to the R3F render cycle. For a vanilla Three.js app, CSS2DRenderer stays the simpler choice.


CSS3DRenderer: Panels in 3D Space

CSS3DRenderer places HTML elements into the 3D scene as if they were 3D objects. Elements have perspective, rotate with the scene, and honour the camera's 3D positioning. Useful for in-scene panels, floating screens, and information displays that feel like part of the 3D world.

When to use CSS3DRenderer

Panels that should feel like physical objects in the scene: information kiosks, floating dashboards, annotation planes. Content that needs to rotate and scale with the 3D perspective. Interfaces where the boundary between UI and 3D should blur.

The trade-off: CSS3D elements do not interact with the WebGL depth buffer. They always render on top of the canvas. For proper depth integration, you need careful masking or accept the limitation.


Canvas-Based UI: Text as Texture

For labels that must participate in 3D depth (visible or hidden based on position relative to other objects), render text to a 2D canvas and use it as a texture on a sprite or plane mesh. The result is a regular 3D object: it participates in depth testing, can be raycasted, and is occluded by objects in front of it.

1

Create an offscreen canvas element. Set dimensions based on expected text length and font size.


2

Draw text using the Canvas 2D API. Set font, colour, alignment, background.

Steps 1 and 2 produce a finished bitmap. The rest hands that bitmap to Three.js and puts it in the scene as a real object.


3

Create a Three.js Texture from the canvas. Set filtering and wrapping.


4

Apply the texture to a SpriteMaterial (sprites always face the camera), then add the Sprite to the scene. It now renders as a 3D object with full depth integration.

The trade-offs: canvas-rendered text is rasterised at a fixed resolution (blurry when zoomed), updating text content means redrawing the canvas and reuploading the texture, and you lose CSS styling and DOM interactivity entirely.


In-Scene UI Libraries

For applications that need buttons, sliders, and input fields inside the 3D scene (VR/AR interfaces, immersive applications), dedicated libraries provide layout and interaction inside WebGL.

three-mesh-ui

UI panels as 3D meshes. MSDF fonts, block layout. Designed for WebXR.

pmndrs/uikit

For R3F. Flexbox in 3D space. Buttons, inputs, scrolling.

Troika 3D UI

CSS-inspired positioning and sizing. Flexbox-style layout.

These are appropriate when the entire experience is 3D (VR, immersive installations). For standard web applications with a 3D viewport, HTML overlays are simpler, more accessible, and offer the full power of CSS styling.

Approach Positioning Depth Performance Accessibility
CSS2DRenderer Tracks 3D, screen-space No depth integration Good to 200 elements Full DOM access
CSS3DRenderer Full 3D perspective No depth buffer Good to 50 elements Full DOM access
Canvas texture Full 3D positioning Participates in depth Scales to thousands No DOM access
In-scene UI Full 3D positioning Participates in depth Moderate No DOM access

Pointer Event Management

The most common interaction bug: clicking a tooltip closes it and simultaneously selects a 3D object behind it. Or clicking a 3D object does nothing because an invisible overlay absorbs the event. Correct pointer handling requires explicit layering.

1

HTML layer receives first

HTML elements sit above the canvas (z-index). They receive pointer events first. When a click hits an HTML element, stop propagation. Use pointer-events: none on transparent overlays; pointer-events: auto on interactive elements within.

2

Canvas layer receives passthrough

The canvas receives pointer events only when they pass through the HTML layer. Use Three.js raycasting on canvas events to determine which 3D object was hit.

3

Coordination flag

Set an isOverUI flag when the pointer enters an HTML element. Clear it on leave. In the canvas event handler, check this flag and skip raycasting if the pointer is over UI.

Touch devices: Touch has no persistent hover state, so there is no mouseenter firing before a click. Use pointerdown/pointerup rather than click for both layers. On pointerdown, check whether the target is an HTML interactive element: if it is, handle it in the DOM; if not, forward to raycasting.


Layout Patterns

The arrangement of 3D viewport and HTML controls follows three common patterns, each suited to different application types.

Split View

3D viewport occupies 60-70% of the screen. Side panel shows details, controls, and data tables. Draggable splitter for user control.

Use ResizeObserver on the canvas container (not window resize) to catch both window and splitter-driven size changes.

Floating Panels

Information panels floating above the 3D scene, positioned absolutely. Property inspectors, sensor details, configuration forms.

Position relative to the viewport, not the 3D scene. A panel tracking a hidden object creates cognitive dissonance.

HUD Overlay

Persistent status information: current selection, coordinates, tool mode, camera position. Fixed relative to the viewport.

Keep minimal. Every pixel of overlay is a pixel of 3D scene the user cannot see or interact with.


The Business Link

HTML-WebGL integration determines how professional and usable a 3D application feels. The difference between a finished product and a frustrating prototype comes down to these details. It is the same gap that separates a polished 3D product configurator from a tech demo.

The economics: The patterns here are straightforward engineering, not novel research. The cost of getting them right is a few days of careful implementation. The cost of getting them wrong is an application that frustrates users on every interaction. Crisp, responsive option panels next to a smooth 3D viewer feel polished. Laggy labels, phantom clicks, and overlapping tooltips feel broken.


Build Polished 3D Interfaces

We build Three.js interfaces where the HTML and 3D layers work together without friction. Labels that track, panels that respond, events that route correctly, and performance that holds up at scale.

Let's talk about your interface →
Graphic Swish