Testing Spatial Data with DbUnit

feyenoord-2
Working with geographic data and spatial databases is fun. With Hibernate Spatial it is also easy to create spatial queries that are standardized and database independent. But testing spatial queries with DbUnit does not work out of the box. In this article I will present a solution with just a simple hook in DbUnit.

Let’s get started

I like examples, so let me show you the problem based on a simple example. Last week I went to a game of my favorite soccer team and we won. I wanted to celebrate this in a pub nearby. Assume we have a table in the database containing all bars with their location. Then I need a service that returns all bars in the neighbourhood of my current location. Let’s start with the dataset we are going to use for the test. We are using DbUnit to test the service, so we can use a FlatXmlDataSet to load the test data. The dataset is listed below.

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
    <Bar id="0" name="Cafe De Pijp" location="???"/>
    <Bar id="1" name="Cafe 't Haantje" location="???"/>
    <Bar id="2" name="Cafe The Hide Away" location="???"/>
</dataset>

The id and name are easy to load. But how do we load the spatial data? Let that rest for a moment. I will come back to that later. The next thing we can do is write the test:

@Test
@DatabaseSetup("knownBars-flatXml.xml")
public void findByLocationShouldReturnAllBarsWithinRange() throws Exception {
    List bars = this.barRepository.findByLocation(
            new Coordinate(100, 100), 50);
    assertThat(extractProperty("name").from(bars))
            .containsOnly("Cafe 't Haantje", "Cafe The Hide Away");
}

The test is fairly easy. With the annotation @DatabaseSetup we load the test data. We call our service and expect that the only bars that are returned are the bars nearby.

Binary data

The problem with the spatial data is that it is stored in binary format in the database. So for the test to load the data correctly, we have to fill the location column with binary data in our test dataset. For Point(20 20) the binary representation is listed below. This is not what we call a readable dataset.

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
    <Bar id="0" name="Cafe De Pijp" 
        location="ACAAAAEAAHpEQDQAAAAAAABANAAAAAAAAA=="/>
    ...
</dataset>

Another problem is, how did we know that this was the binary representation of this particular geo location? (Actually I created the Bar entity in java and used hibernate logging to look at the data that hibernate stored). Wouldn’t it be nice if geo data could be stored using a human readable format? WKT to the rescue.

Well-known text (WKT)

Well-known text (WKT) is a text markup language for representing vector geometry objects. In total, there are 18 distinct geometric objects that can be represented:

• Geometry
• Point, MultiPoint
• LineString, MultiLineString
• Polygon, MultiPolygon, Triangle
• CircularString
• Curve, MultiCurve, CompoundCurve
• CurvePolygon
• Surface, MultiSurface, PolyhedralSurface
• TIN
• GeometryCollection

This is good news! For example to represent a point we write POINT(20 20) in our dataset and somewhere we transform this to the binary format used to store the object to the database. But we have one more problem to tackle. If we use POINT(20 20) in our dataset, how do we know that we are dealing with a geo column and not the string POINT(20 20)? In DbUnit’s FlatXmlDataSet the columns are not typed. The simplest solution I could think of was to prefix the WKT string by a fixed string GEOM_. So our test data looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
    <Bar id="0" name="Cafe De Pijp" location="GEOM_POINT(20 20)"/>
    <Bar id="1" name="Cafe 't Haantje" location="GEOM_POINT(110 85)"/>
    <Bar id="2" name="Cafe The Hide Away" location="GEOM_POINT(80 100)"/>
</dataset>

Now we only need a hook in DbUnit to transform every column that starts with GEOM_. This is done by extending FlatXmlDataSet and overriding the row(final Object[] values) method:

public class SpatialAwareFlatXmlDataSet extends FlatXmlDataSet {
    public SpatialAwareFlatXmlDataSet(final InputStream inputStream) 
            throws DataSetException {
        super(new FlatXmlProducer(new InputSource(inputStream)));
    }

    @Override
    public void row(final Object[] values) throws DataSetException {
        convertGeometryColumnsFromTextToBinary(values);
        super.row(values);
    }

    private void convertGeometryColumnsFromTextToBinary(final Object[] values) {
        for (int i = 0; i < values.length; i++) {
            if (values[i] != null && values[i].toString().startsWith("GEOM_")) {
                values[i] = GeometryUtil.convertFromTextToBinary(
                    values[i].toString().substring(5));
            }
        }
    }
}

For completeness here is the implementation of the service:

@SuppressWarnings("unchecked")
@Override
public List findByLocation(Coordinate startPoint, int range) {
    Session session = (Session)entityManager.getDelegate();
    org.hibernate.Query query = session.createQuery("SELECT b from Bar b " + 
        "where within(location, :filter) = true");
    Geometry circle = createCircleWithinRange(startPoint, range);
    query.setParameter("filter", circle, GeometryUserType.TYPE);
    return query.list();
}

private Geometry createCircleWithinRange(final Coordinate centerPoint, 
        final int range) {
    GeometricShapeFactory gsf = new GeometricShapeFactory();
    gsf.setSize(range);
    gsf.setNumPoints(100);
    gsf.setCentre(new Coordinate(centerPoint));
    return gsf.createCircle();
}

Hope this helps you. Let me know what you think of the solution or if you have an even better or cleaner solution.

The complete sources are available on github.