PostGIS - A Real-World Example

Symphony

January 9th, 2022

In this article, we will put together some of the basic spatial data types and operations while leveraging PostGIS to build something that resembles a real-world application that can be useful for our daily lives. The focus will be on the database layer of the application, as that’s where most of the GIS magic happens, but we will ultimately build a full-stack application featuring PostGIS database, Java Spring Boot back-end, and HTML/JavaScript front-end.

Most mobile devices are equipped with a GPS locator that can be used to determine the current location of the device. Our goal is to build a PostGIS database that stores GPS coordinates coming from a client and then combines them with the static terrain elevation data to compute the distance traveled, elevation gained, average speed, and other statistics, similar to how some hiking/running/biking applications work.

To accomplish this goal, we need to get acquainted with raster and vector spatial data types and some of the basic PostGIS functions that will allow us to combine all that data and then analyze it to extract the data needed to compute the desired output values. We will also get to know some basics of REST controller implementation in Java SpringBoot as well as geospatial visualization using LeafletJS.

Elevation Raster Data

The first thing we need is to find a data source for terrain elevation that covers our area of interest. This data is commonly referred to as “DEM”, short for “Digital Elevation Model”. DEMs are typically stored in GeoTIFF or similar raster file formats with a single floating-point band where values of each pixel represent terrain elevation at that location. DEMs usually have a resolution between 1m/px to 50m/px, meaning that each pixel represents a square on the surface of the earth and we capture only one elevation value for each of those squares. Any variation of elevation within each square would be lost. Resolution should be picked based on data availability and also the use case. When we do not need extreme precision, 20-50 m/px resolution is just fine. Besides, higher resolution files tend to be huge and require more storage and processing power - a 1 km squared area requires a million pixels in 1 m/px resolution, but only 400 pixels in 50 m/px resolution, so keep that in mind when doing an estimation based on the area of interest.

In a way, DEMs can be thought of as rectangular matrices with floating-point values, but they can also be visualized as a 3D terrain model, as shown below.

There are many free DEM data providers available and they typically cover one part of the globe. For the purpose of this example, we will use copernicus.eu which covers the territory of Europe and we will download one of the tiles that cover our area of interest - that is tile labeled E50N20 covering most of the Balkans. We can download as many tiles as we need to cover a wider territory, we just need to place all the extracted GeoTIFF files in the same directory after downloading before we can move on to the next step.

Note that raster files can be very large as they contain a lot of information. One DEM tile from the copernicus weighs over 5GB! Covering large territories requires a lot of storage space and processing time when inserting the data.

Now we can run a series of commands that will process raw DEM files, prepare them for importing into PostGIS and execute the script that will set up a table and populate it with the data.

raster2pgsql -I -C -M *.TIF -F -t 100x100 -s 3035 public.elevation > elev.sql

psql -U bancika -d example -f elev.sql

The first command takes all TIF files in the current directory, chops them into smaller tiles that are 100x100 pixels, and creates a valid SQL script that produces a table named “public.elevation” and inserts all those smaller tiles into the table. By chopping large rasters into many smaller ones, we are reducing the load on PostGIS while inserting the data, but also while reading the data, as we will typically need to process small parts of the raster at any given time, so it’s much more efficient to load a few small tiles and process them in-memory than to pull a 5GB raster and then extract a small piece of it on the fly. Tile sizes between 100px and 200px are a good starting point for most applications.

In addition to this, the “-I” command will cause the raster2pgsql tool to include a spatial index on the elevation table to improve the performance of spatial queries against rasters. Another important flag is the “-s 3035” that signals that our raster is using SRID 3035. We must make sure that the correct SRID is provided and that it matches the SRID used by the data provider. In this case, Copernicus uses SRID 3035, so we need to use that as well.

Finally, psql command executes the script created in the first step against the local database named “example” and stores rasters into the database. This can take a few minutes as there’s a lot of data to process. We can verify the results by checking if the elevation table is really there and how many records are stored.

Running the query below shows that we have over 120k smaller tiles that were made from a single large raster file.

select count(*) from elevation

Location Vector Data

Next, we can create a table that will be used to store locations. We will use SRID 4326 for location storage because that is a standard for GPS coordinates. We need to keep that in mind because it is not the same as our elevation raster SRID and that discrepancy will be addressed later. Below is the simplest table schema that tracks only one series of locations over time. There is no support for multiple users or any advanced features, but it can be easily extended to support that.

create table locations (

id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),

point geometry(Point,4326),

time timestamp DEFAULT CURRENT_TIMESTAMP

);

create index locations_point_idx ON locations using GIST(point);

Now we need to simulate a client application running on a mobile device and periodically sending GPS location updates to the back-end. We will use a short PgSQL script that starts at a preset location and then randomly moves the point on a map 45 times for up to 1km from the previous point, with one minute apart from every two consecutive points. At the end it will return to the starting point, closing the loop. The coordinates of the starting location chosen for the purpose of this test pinpoint a mountain top, not too far from the Symphony hub in Niš.

do $$

declare

_max_move float;

_start_loc geometry;

begin

_max_move := 1 / 110.0; -- 1km

_start_loc := ST_MakePoint(22.57925,43.36054); -- babin zub

-- start point

delete from locations;

insert into locations (point)

select _start_loc;

-- add random points

for counter in 1..45 loop

insert into locations (point, time)

select ST_Translate(existing.point, random() * _max_move - _max_move / 2, random() * _max_move - _max_move / 2),

time + (1 * interval '1 minute')

from

(SELECT * FROM locations ORDER BY time desc LIMIT 1) existing;

end loop;

-- end point

insert into locations (point, time)

select _start_loc,

time + (1 * interval '1 minute')

from

(SELECT * FROM locations ORDER BY time desc LIMIT 1) existing;

end; $$

Spatial Queries

Now that we have all the input data stored, we can play with spatial queries to combine it and extract useful information. We can join locations with elevation and get elevation at each location by running the query below.

select l.time, ST_AsText(point) as location, ST_Value(e.rast, ST_Transform(l.point, 3035)) as elevation

from locations l

inner join elevation e ON ST_Intersects(ST_ConvexHull(e.rast), ST_Transform(l.point, 3035));

This query finds a raster that intersects with each of the location points, joins the two tables, and extracts elevation at each point using ST_Value function.

As mentioned earlier, SRIDs of the two tables do not match, so we need to do some conversion using ST_Transform function before using spatial operation ST_Intersects. Since we have built a spatial index on convex hulls of rasters while importing the data, that part of the query will be fast. However, even though we have built an index on location.point, it will not be used here because we are transforming locations to SRID 3035 and the index will not work on the transformed points. There are ways to address this problem and optimize this query but we will cover it in one of the more advanced articles.Using the query above as a starting point, we can write a more advanced query that computes ascent and descent on our trip.

with data_points as (

select l.time, ST_Value(e.rast, ST_Transform(l.point, 3035)) as elevation,

RANK () OVER (ORDER BY time) as ordinal

from locations l

inner join elevation e ON ST_Intersects(ST_ConvexHull(e.rast), ST_Transform(l.point, 3035))

), elevation_deltas as (

select dp1.ordinal,

dp2.elevation - dp1.elevation as delta,

case when dp1.elevation < dp2.elevation then 'ascent' else 'descent' end as direction

from data_points dp1

inner join data_points dp2 on dp2.ordinal = dp1.ordinal + 1

)

select (select sum(delta) from elevation_deltas where direction = 'ascent') as totalAscent,

(select sum(delta) from elevation_deltas where direction = 'descent') as totalDescent

Producing the output

The first CTE (Common Table Expression, defined under the WITH block) is basically a join between locations and elevations that we already covered before. The only addition is the ordinal column that assigns consecutive integers to each record from the joined tables. We use those ordinals in the second CTE to compute differences between the elevations at every two consecutive locations and finally, we sum up all the positives and all the negatives. Since we built our test dataset with the last location equal to the first location, we expect that the ascent and descent will be equal.

Another interesting query we can run is to compute the total distance traveled and average speed.

with data_points as (

select l.time, l.point,

RANK () OVER (ORDER BY time) as ordinal

from locations l

), distances AS (

select ST_DistanceSphere(dp1.point, dp2.point) as distance, dp1.time

from data_points dp1

inner join data_points dp2 on dp2.ordinal = dp1.ordinal + 1

)

select sum(distance) / 1000 as total_distance_km,

extract(epoch from max(time) - min(time)) / 3600 as total_hours,

(sum(distance) / 1000) / (extract(epoch from max(time) - min(time)) / 3600) as avg_speed_kmh

from distances;

Again we use a similar starting data set, but now we are focused on distances between the points, ignoring the elevation component. We are using ST_DistanceSphere to compute the distance between every two consecutive locations and then use those distances to compute the total distance and the average speed.

Lastly, we can build a view that generates a linestring out of individual location points, representing the route traveled. Without it, we only have a set of points with no way of knowing for sure how they are connected.

create view route as (

with data_points as (

select point

from locations

order by time

)

select 1 as id, ST_MakeLine(point) as line

from data_points

);

One of the ways to deliver the route linestring is by serializing the “route.line” column to GeoJSON. GeoJSON is a widely used format for serializing spatial vector objects and is based on JSON format.

PostGIS function ST_AsGeoJSON converts a geometry to a GeoJSON feature, as shown in the query below.

select ST_AsGeoJSON(line)

from route

In order to use the results of this query for display, we will need to add more information to the GeoJSON, such as the header, CRS definition, feature properties. This will be handled in the next step.

QGIS Validation

QGIS is an industry-standard geospatial visualization tool that supports many data sources and has a ton of features for processing and analyzing that data. It is not part of the application per se, but it can be very useful for debugging or manual data preparation or analysis. In this case, we can use it to do a quick validation of the data by connecting QGIS to our local PostGIS database in order to visualize the “route” view and individual location points over satellite images coming from Google maps.

First, we want to connect QGIS to Google maps and fetch satellite tiles. In the “Browser” pane, navigate to “XYZ Tiles” and create a new connection using the following URL

http://mt0.google.com/vt/lyrs=s&hl=en&x={x}&y={y}&z={z}

When the new connection is created, drag it to the “Layers” panel or simply double click on it to add it to the top of the empty project. This will be a background layer for our QGIS project.

Now we need to connect QGIS to our local PostGIS instance. Again, we are going to the “Browser” panel, but this time we need to create a new connection under “PostGIS”, specifying the properties of the local Postgres database. If the connection is successful, QGIS should show everything in the database that can be pulled in and visualized on the map. It is smart enough to recognize spatial columns in all tables and offer them here. Icons next to the objects designate the type of data (points, lines, polygons, rasters, etc).

If we drag the “locations” table and “route” view to the map and play with the style of those layers, we should see something like the screenshot below (minus the marker for the starting location that is added manually for the purpose of the article). This is our entire 45-minute route on Babin Zub mountain peak.

QGIS can also be used to visualize the DEM file(s) we downloaded at the beginning of the article. After dropping the DEM TIF file to QGIS, above the Google satellite images, it shows a grayscale render of the DEM with the lowest values rendered black and the highest values rendered in white. QGIS automatically detects the value range and applies the grayscale gradient for the in-between values, as shown on the screenshot below). There are many other ways to render a DEM, with the 3D-like “Hillshade” being a commonly used option.

The screenshot below shows a DEM rendered on top of a satellite image. We can clearly see the coverage of the DEM and where the mountains and the flats are.

Minimalistic Back-End

Now that we have a functional PostGIS database, we can build a minimalistic Java Spring Boot back-end. Since all of our GIS-related heavy lifting is done by the database, all we need to do here is to create a REST controller that will work as a proxy between the clients and the database. We will need the following endpoints exposed:

POST /locations/{lat}/{lon} for updating the current location from the client’s device.
GET /walk-route for retrieving the route traveled represented as a GeoJSON.
GET /walk-stats for retrieving the statistics of the walk.

The queries embedded in the code below are similar to the ones we ran directly against the database in the previous exercise. The getWalkStats() method is using JDBC’s queryForObject method that maps the results of a query to an instance of WalkStatsDto class using Hibernate’s BeanPropertyRowMapper. In the getWalkRoute() method we are building a fully-fledged GeoJSON using the feature GeoJSON coming from PostGIS.

@RestController

public class GISController {

@Autowired

private JdbcTemplate jdbcTemplate;

@PostMapping(path = "/locations/{lat}/{lon}")

public ResponseEntity<Void> postLocation(@PathVariable(name = "lat") double lat,

@PathVariable(name = "lon") double lon) {

jdbcTemplate.update("insert into locations (point, time) values (ST_MakePoint(?, ?), ?)",

lon, lat, LocalDateTime.now());

return ResponseEntity.ok(null);

}

@GetMapping(path = "/walk-route")

public ResponseEntity<String> getWalkRoute() {

String feature = jdbcTemplate.queryForObject("select ST_AsGeoJSON(line) from route", String.class);

String geoJson = """

{

"type": "FeatureCollection",

"name": "walk-route",

"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },

"features": [

{ "type": "Feature", "properties": { "name": "walk-route" }, "geometry":

""" + feature + """

]

}""";

return ResponseEntity.ok(geoJson);

}

@GetMapping(path = "/walk-stats")

public ResponseEntity<WalkStatsDto> getWalkStats() {

String statsQuery = """

with points_with_elevation as (

select l.time, l.point, ST_Value(e.rast, ST_Transform(l.point, 3035)) as elevation,

RANK () OVER (ORDER BY time) as ordinal

from locations l

inner join elevation e ON ST_Intersects(ST_ConvexHull(e.rast), ST_Transform(l.point, 3035))

), elevation_deltas as (

select dp1.ordinal,

dp2.elevation - dp1.elevation as delta,

case when dp1.elevation < dp2.elevation then 'ascent' else 'descent' end as direction

from points_with_elevation dp1

inner join points_with_elevation dp2 on dp2.ordinal = dp1.ordinal + 1

), distances AS (

select ST_DistanceSphere(dp1.point, dp2.point) as distance, dp1.time

from points_with_elevation dp1

inner join points_with_elevation dp2 on dp2.ordinal = dp1.ordinal + 1

)

select

(select sum(delta) from elevation_deltas where direction = 'ascent') as totalAscent,

(select sum(delta) from elevation_deltas where direction = 'descent') as totalDescent,

sum(distance) / 1000 as totalDistanceKm,

extract(epoch from max(time) - min(time)) / 3600 as totalHours,

(sum(distance) / 1000) / (extract(epoch from max(time) - min(time)) / 3600) as avgSpeedKmh

from distances;

""";

var rowMapper = BeanPropertyRowMapper.newInstance(WalkStatsDto.class);

WalkStatsDto walkStatsDto = jdbcTemplate.queryForObject(statsQuery, rowMapper);

return ResponseEntity.ok(walkStatsDto);

}

@Data

@Builder

@NoArgsConstructor

private class WalkStatsDto {

private double totalAscent;

private double totalDescent;

private double totalDistanceKm;

private double totalHours;

private double avgSpeedKmh;

}

Please note that the whole back-end code is in the same class for the sake of simplicity, even though it is generally a really bad idea. For the same reason, we are omitting ORM, logging, proper error handling, and the configuration, as those transcend the scope of this article.

Minimalistic Front-End

The last piece of this GIS puzzle is the front-end web application that is running in a browser on the user's mobile device. For the sake of simplicity we will use native HTML/JavaScript code, supporting the following features:

Show satellite images as a base layer, using Google’s satellite imagery
Draw walking route as an optional overlay layer on top of the base layer
Show walk statistics under the map
Once a minute, send the current location coordinates to the back-end

For GIS visualization we will integrate LeafletJS plugin - a powerful and flexible visualization tool used by many well-known websites. It allows us to build a map like a cake - layer by layer. At the bottom of the GIS cake, we have one or more base layers that a user can toggle between, showing exactly one layer at a time. On top of the base layer, it can show one or more overlay layers at the same time. In addition to layers, it can show markers, custom legends, etc.

In order to obtain the current location, we are using navigator.geolocation from Geolocation API. This functionality is managed by the browser and will typically ask the user for permission to access the location.

As with the back-end example, the whole code will be packaged in a single index.html file for the sake of simplicity.

<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>PostGIS Example</title>

<meta name="viewport" content="width=device-width, initial-scale=1.0">

<link rel="stylesheet" href="https://unpkg.com/leaflet@1.7.1/dist/leaflet.css" integrity="sha512-xodZBNTC5n17Xt2atTPuE1HxjVMSvLVW9ocqUKLsCC5CXdbqCmblAshOMAS6/keqq/sMZMZ19scR4PsZChSR7A==" crossorigin=""/>

<script src="https://unpkg.com/leaflet@1.7.1/dist/leaflet.js" integrity="sha512-XQoYMqMTK8LvdxXYG3nZ448hOEQiglfqkJs1NOQV44cWnUrBc8PkAOcXy20w0vlaXaVUearIOBhiXZ5V3ynxwA==" crossorigin=""></script>

<style>

html, body {

height: 100%;

margin: 0;

}

</style>

</head>

<body>

<div id="map" class="leaflet-container leaflet-touch leaflet-retina leaflet-fade-anim leaflet-grab leaflet-touch-drag leaflet-touch-zoom" tabindex="0" ></div>

<p id="stats"></p>

<script>

// helper function for fetching json from the BE

function fetchJSONFile(path, callback) {

var httpRequest = new XMLHttpRequest();

httpRequest.onreadystatechange = function() {

if (httpRequest.readyState === 4) {

if (httpRequest.status === 200) {

var data = JSON.parse(httpRequest.responseText);

if (callback) callback(data);

}

};

httpRequest.open('GET', path);

httpRequest.send();

}

// setup a timer

setInterval(onTimer, 60 * 1000); // once a minute

function onTimer() {

if (navigator.geolocation) {

navigator.geolocation.getCurrentPosition(updateLocation);

} else {

x.innerHTML = "Geolocation is not supported by this browser.";

}

updateWalkStats();

}

// call BE to update the current location

function updateLocation(position) {

var httpRequest = new XMLHttpRequest();

httpRequest.onreadystatechange = function() {

if (httpRequest.readyState === 4) {

console.log("Updated location, status: " + httpRequest.status)

}

};

httpRequest.open('POST', "http://mypostgisexample.com/locations/" + position.coords.latitude + "/" + position.coords.longitude);

httpRequest.send();

}

function updateWalkStats() {

fetchJSONFile("http://mypostgisexample.com/walk-stats", function(data) {

document.getElementById("stats").innerHTML = "ascent: " + data.totalAscent + "m<br>descent: "+ data.totalDescent + "m<br>distance: " + data.totalDistanceKm + "km<br>avg speed: " + data.avgSpeedKmh + "kmh";

});

}

// setup google satellite tiles

var googleSat = L.tileLayer('http://{s}.google.com/vt/lyrs=s,h&x={x}&y={y}&z={z}',{

maxZoom: 20,

subdomains:['mt0','mt1','mt2','mt3']

});

// initialize the map

var map = L.map('map', {

center: [43.36054, 22.57925],

zoom: 11,

layers: [googleSat]

});

var baseLayers = {

"Google Satellite": googleSat

};

var overlays = {

};

// fetch the walk route geojson

console.log("Loading walk route started.");

fetchJSONFile("http://mypostgisexample.com/walk-route", function(data) {

console.log("Loading walk route finished.");

// create a leaflet layer using the parsed GeoJSON features

var walkRoute = L.geoJSON(data.features, {

style: function(feature) {

return {

"color": "#FF0000",

"weight": 3,

"opacity": 0.8,

"fillOpacity": 0

}

}).addTo(map);

overlays["Walk Route"] = walkRoute;

// center map to the walking route

map.fitBounds(walkRoute.getBounds());

// configure map layers once we have all the data ready

L.control.layers(baseLayers, overlays).addTo(map);

});

updateWalkStats();

</script>

</body>

</html>

Please note that “mypostgisexample.com” in the code above should be replaced with the actual URL of the back-end.

When the page loads on a mobile device it will look similar to the screenshot below, provided that the back-end is functioning and returning results. Most of the screen is covered by the map with the walking route rendered above the satellite imagery. In the top-left corner, we have zoom in/out controls and the top-right corner shows a layer selection tool. Finally, at the bottom of the screen, we have statistics coming from the back-end.

Conclusion

In this article, we demonstrated how to combine vector and raster spatial data and apply some basic spatial operations to them to solve an interesting real-world problem. On top of the database layer, we also built a minimalistic Java SpringBoot back-end and HTML/JavaScript front-end for a complete solution. However, with this example, we are only scratching the surface of the vast universe of GIS problems and techniques that can be used to solve them. Hopefully, it can be used as a starting point for further research.

About the author

Branislav Stojkovic is a senior software engineer with more than 15 years of industry experience working at our engineering hub in Niš.

Branislav has extensive hands-on experience in designing, building, and maintaining large enterprise systems, mostly in the financial industry, and strong technical knowledge in back-end development using Java and .NET technologies and database modeling. The financial sector is often volatile and requires a deep understanding of the domain knowledge, and Branislav can dive deep into the details but also have a broad, high-level view of the whole system and the roadmap for the future. Agile methodologies are often employed to cope with the volatility of the financial industry. This search for optimal solutions started at an early stage of his career while working with algorithm design and competing in high school and university competitions.

Contact us if you have any questions about our company or products.

We will try to provide an answer within a few days.