A new toolkit for data-driven maps

People have been rendering maps on the web for a long time, but current approaches still struggle with scientific use cases that we’re trying to address with a new tool. Check out the demo here.

First, some history and context. Web mapping most likely began in 1993 with the Xerox Parc Map Viewer, which rendered maps as GIF images generated by a server. In 2005 Google Maps popularized the now widespread paradigm of “tiled map pyramids,” which divide the world into 2D tiles with more tiles at higher resolutions. In this paradigm, the client renders only the tiles that need to be displayed given the current location and zoom level, stitching them together like a quilt.

Google didn't invent the concept of tiled maps — video games were arguably doing something similar in the 70s and 80s using tiles to render large worlds while fitting under memory constraints, and multi-scale pyramid representations in signal processing date back even further — but the success of Google Maps helped make tiled maps a standard for the web.

For most of the 2000s, tiled web maps used “raster” tiles — static, pre-generated images, typically 256 x 256 pixels in size. Raster tiles are fast to load and easy to render, but limited by their fixed resolution and difficult to manipulate or style interactively on the client, instead requiring server-side regeneration.

In the early 2010s “vector” tiles were introduced. Rather than pre-render points, lines, shapes, and other geometrical elements as pixels, vector tiles store the geometry explicitly, so it can be rendered at arbitrarily high resolution with flexible styling. Vector tiles quickly became a standard, and the open source mapbox-gl-js library from Mapbox made it easy to render vector tiles in JavaScript using WebGL, enabling an exciting ecosystem of developers and use cases.

The problem

As is often the case, scientific use cases push the boundaries of current software.

In many scientific settings — including climate science — we work with numerical, gridded data arising from simulations, observations, or computational analyses. The data can be 2D, like global temperature on a particular day, but they are often multi-dimensional, for example, monthly precipitation (3D) or ocean temperature across depth and time (4D).

It's surprisingly tough to render these data with existing tools.

We can treat them as classic raster tiles by converting them into a pixelated image format like PNG or JPEG. But that makes it difficult to dynamically customize the rendering based on user input — the same reason web mapping moved away from raster tiles in the first place! That approach also requires storing a copy of the data in a format we wouldn’t use otherwise.

Treating these data as vector tiles has its own challenges. We can convert a dense numerical grid to a collection of points, which are a basic geometrical type. Vector tiles can store points, and existing tools can query and render several features per point. That's exactly how we built our forest risk map, and especially for sparse data it’s a workable approach. But it requires producing an inefficient intermediate format, and it offers little flexibility for optimizing how we store, fetch, and combine different dimensions of the data, which for us has become a performance bottleneck.

A solution

Informed by the strengths and weaknesses of existing approaches, we decided to try something new: using a file format designed for multi-dimensional tiles, and using the data in those tiles to render to WebGL directly rather than through an intermediate layer.

First, the file format: Zarr. It's a compressed, binary, chunked format for multi-dimensional arrays. It's become popular in the scientific Python community, especially alongside the Xarray package and the Pangeo Project’s cloud initiatives, but the format itself is rigorously specified and files can be read in many other languages.

Zarr is well-suited to tiled maps. The “chunks” of a Zarr dataset — the blocks into which the multi-dimensional array is divided — are analogous to the “tiles” of a web map, except that chunks can span multiple dimensions, not just space. The data are compressed and they are fast to read/write in cloud-optimized storage (e.g. Google, Amazon, Azure).

A dataset stored in Zarr typically represents one spatial scale — the scale at which the analysis or simulation was performed. However, it's easy to recreate the same dataset at multiple scales, carrying along the other non-spatial dimensions. We wrote a small package ndpyramid to do that conversion, and to define standard metadata for a multi-dimensional spatial pyramid. (In this conversion step we also reproject the data into Web Mercator, which the rest of our toolkit currently requires. See below for more on relaxing this requirement.)

Second, we need to put the data on the screen. We wrote a small library for reading Zarr in JavaScript called zarr-js, and we're using the regl library to render fetched chunks of binary data directly via WebGL as either grids or textures. Among alternatives, regl has worked well because it's minimal, performant, and avoids the boilerplate of raw WebGL without the scene graph functionality of three-js and react-three-fiber (both are amazing tools, they just offer more than we needed here).

Finally, because we're showing data on a map, we need to render traditional map layers — roads, rivers, countries, etc. — at the same time. For these layers we're continuing to use mapbox-gl-js with vector tiles, because parsing and rendering vector tiles is a hard problem and mapbox-gl-js solved it well.

Putting it together

We've released an open-source library called @carbonplan/maps that puts all these pieces together. It's a small set of React components for rendering data-driven, gridded, raster maps. Behind the scenes, it synchronizes data fetching via zarr-js, raster data rendering via regl, and vector rendering and interactive controls powered by mapbox-gl-js. The library handles all the tile math, selecting which chunks to load given the current view. Everything is wired for reactivity — map properties are controlled by React props, and the map rerenders when props change, synchronized with the main rendering loop.

For the simplest possible example, the following code renders a 2D map of global temperature with a coastline.

import { Map, Raster, Line } from '@carbonplan/maps'
import { useColormap } from '@carbonplan/colormaps'

const bucket = 'https://storage.googleapis.com/carbonplan-maps/'

const colormap = useColormap('warm')

<Map>
  <Line
    color={'white'}
    source={bucket + 'basemaps/land'}
    variable={'land'}
  />
  <Raster
    colormap={colormap}
    clim={[-20,30]}
    source={bucket + 'v2/demo/2d/tavg'}
    variable={'tavg'}
  />
</Map>

The source data is a Zarr group with temperature data from WorldClim at multiple zoom levels. The file layout (with some files hidden for clarity) is as follows — note how the number of chunks quadruples as the zoom level increases. We automatically load metadata from files in this directory.

/
 ├── .zmetadata
 ├── 0
 │   ├── tavg
 │       └── 0.0
 ├── 1
 │   ├── tavg
 │       └── 0.0
 │       └── 0.1
 │       └── 1.0
 │       └── 1.1
 ├── 2
...

With the same component, we can just as easily render a 4D map where the third dimension is the month and the fourth dimension is temperature or precipitation (labeled “band”). Pointing to this dataset, the Raster component becomes the following.

<Raster
  colormap={colormap}
  clim={clim}
  source={bucket + 'v2/demo/4d/tavg-prec-month'}
  variable={'climate'}
  selector={{ band, month }}
/>

Here, the selector prop allows us to index into the full multi-dimensional array. Note that the code sample omits how the inputs to the selector are controlled by a slider and a menu, and how the colormap and clim are based on the selection. The full code is just slightly more complicated. You can also check out a demo with even more options.

In more advanced settings, we might want more control over rendering, including the ability to combine data across multiple layers with math. We've exposed a custom fragment shader prop frag to make this easy. In the sample below, we render the average temperature over January and February by loading both months at once and calculating the average on the GPU, followed by a rescaling and colormap lookup. While this requires writing shader code, it lets us combine data layers via arbitrarily complex math, with high performance and full control over what gets rendered. Here’s a Raster component that demos this approach.

<Raster
  colormap={colormap}
  clim={[-20, 30]}
  source={bucket + 'v2/demo/3d/tavg-month'}
  variable={'tavg'}
  selector={{ month: [1, 2] }}
  frag={`
    float average = (month_1 + month_2) / 2.0
    float rescaled = (average - clim.x)/(clim.y - clim.x);
    gl_FragColor = texture2D(colormap, vec2(rescaled, 1.0));
  `}
/>

Comparisons

There are so many existing approaches to web maps, and this work is just one contribution to a rich ecosystem of tools.

Both mapbox-gl-js itself and another library called deck.gl handle a much wider variety of data types than our library does, but we think ours handles gridded raster data more effectively, so in that sense they're highly complementary. While we use mapbox-gl-js inside our library, and plan to continue doing so, we are restricted to using v1 because Mapbox changed its licensing for v2. We are actively following the projects that have forked from v1, as we may ultimately need to switch to one of them, or rebuild some components ourselves.

One major limitation of mapbox-gl-js v1 is that it only supports the Web Mercator projection. Building off tools like d3-geo, we have prototyped rendering raster data with alternative map projections. However, we also need to handle vector tile rendering (and tile indexing) with other projections, which again would involve modifying or replacing our dependency on mapbox-gl-js. This feature is important both because of the representational benefits of other map projections, and because we wouldn’t need to store multiple copies of the same data (unprojected and projected), which gets expensive at terabyte or petabyte scale.

Another effort to load Zarr in JavaScript emerged around the time we started zarr-js. It’s a more full featured implementation of both Zarr reading and writing in the browser, but we prefer our smaller, read-only library for our mapping use case. It'd be great to find some unification here, and the authors of that library have expressed interest in a more minimal version.

Finally, another recent project called Viv renders large microscopy data stored as Zarr files in the browser. (It also renders data from a microscopy-oriented variant of TIFF called OME-TIFF). There are both similarities and important differences between rendering microscopy data and rendering maps of the world. Even if it doesn't make sense to merge, we'd love to share learnings across the projects and develop shared standards, especially involving multi-dimensional pyramids stored in Zarr.

Next steps

We're just getting started. We used our library for a production web map for the first time two weeks ago. We'll be building out the library further for our own use cases, and are planning to make it more robust — with validation, tests, etc. — alongside adding new features. It's in the early stages of development, so while we'll adhere to semantic versioning, you should expect frequent updates and breaking changes.

We want to enable others — from front-end developers looking to build new map experiences to Python-wielding climate scientists wanting an easier way to share their datasets on the web — so if you’re working on these problems, please reach out.