Skip to content

Conversation

@dain
Copy link
Member

@dain dain commented Jan 8, 2026

Description

Migrate the geospatial plugin from ESRI geometry-api-java to JTS (Java Topology Suite) as the core geometry library. JTS is more widely used, better maintained, and provides the foundation for upcoming Iceberg geometry type support.

Key changes:

  • Use JTS Geometry as the native stack type instead of serialized bytes
  • Replace custom geometry serialization with standard EWKB format
  • Convert all geometry functions (ST_*, Bing tiles, aggregations, spatial joins)
  • Simplify Hadoop ESRI JSON reader and PostgreSQL connector geometry handling

Additional context and related issues

JTS is the de facto standard geometry library in the Java ecosystem, used by GeoTools, PostGIS, and Apache Sedona. This change aligns Trino's geospatial implementation with the broader ecosystem and enables future improvements like Iceberg geometry support.

Reviewer Guide

Commit Organization

The migration is structured to minimize risk and make review manageable:

  1. Test infrastructure first: Initial commits separate Repairs (fixing invalid OGC inputs
    like unclosed rings that JTS correctly rejects) from Refactors (adding assertSpatialEquals to
    handle harmless differences like vertex ordering). Any test changes in subsequent commits indicate
    actual behavior differences between ESRI and JTS.
  2. Incremental function conversion: Functions are ported in logical groups:
    • Basic geometry functions (ST_Boundary, ST_Buffer, etc.)
    • Accessor functions (ST_NumPoints, etc.)
    • EncodedPolyline, BingTile, aggregation functions
    • ST_Union and remaining GeoFunctions
    • Spatial join and envelope handling
  3. Serialization change: Replaces custom binary format with EWKB (Extended Well-Known Binary),
    the PostGIS standard that preserves SRID.
  4. Hadoop reader: Converted separately due to Maven dependency constraints.
  5. Import cleanup: Using qualified imports for JTS objects removes the fully-qualified names that
    were needed while both libraries coexisted. Please do not comment on verbose imports in earlier commits.
  6. Stack type change: Final commit switches from Slice to JTS Geometry as the native stack type,
    eliminating serialization cycles between function calls.

Behavior Differences

Test changes document intentional behavior differences:

  • ST_Buffer approximation: JTS uses 8 segments per quadrant (matching PostGIS/GEOS standards) by
    default, whereas ESRI used 24. This results in buffer polygons with fewer vertices (e.g., 32 vs 96 for
    a point buffer) while maintaining standard precision.
  • ST_NumPoints: Now counts the closing vertex of polygons, complying with the OGC standard (where a
    triangle has 4 points: A-B-C-A).
  • ST_Boundary: Returns LINESTRING instead of MULTILINESTRING for simple polygons (simpler, more
    standard return type).
  • Empty Geometries: ST_Buffer(infinity) returns POLYGON EMPTY (instead of MULTIPOLYGON EMPTY).
  • ST_Union: Returns an empty GEOMETRYCOLLECTION instead of null for empty inputs, improving
    null-safety in downstream SQL.
  • Floating Point: Minor precision differences in ST_Centroid and ST_Area due to different
    underlying math implementations (verified via tolerance checks).
  • Vertex Ordering: Polygons are now normalized to standard winding (e.g., Clockwise for exterior rings) during serialization, which may change the order of vertices returned by ST_Points.

Release notes

(x) Release notes are required, with the following suggested text:

## Geospatial
  * Replace ESRI geometry library with JTS for improved ecosystem compatibility. ({issue}`issuenumber`)
  * WKT parsing is now stricter per OGC standards and rejects previously accepted invalid syntax. ({issue}`issuenumber`)
  * `ST_Union` edge case changes: empty inputs return empty geometry collection instead of null, and point-on-line unions no longer insert vertices at intersection points. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jan 8, 2026
@github-actions github-actions bot added the postgresql PostgreSQL connector label Jan 8, 2026
@dain dain force-pushed the user/dain/geo-jts branch from b4e8c12 to aad9700 Compare January 9, 2026 01:13
@github-actions github-actions bot added the hive Hive connector label Jan 9, 2026
@dain dain force-pushed the user/dain/geo-jts branch 7 times, most recently from fe6e6cd to b3494aa Compare January 10, 2026 02:11
dain added 15 commits January 9, 2026 20:19
Fix test data that was accepted by ESRI but rejected by JTS which
strictly enforces the OGC Simple Features Specification:

- Close polygon rings (first point must equal last point)
- Fix single-point LINESTRING to have two points (minimum required)
- Fix MULTILINESTRING EMPTY syntax (remove extra parentheses)
- Replace invalid MULTIPOLYGON with overlapping polygons using ST_Union
- Replace degenerate polygons in GEOMETRYCOLLECTION with valid geometries
Adds assertSpatialEquals helper to TestGeoFunctions that uses
stEquals for geometry comparison. Converts testSTGeometryType
and testSTBuffer to use the new helper.

testSTBuffer was updated to use property-based assertions (ST_Envelope
and ST_Area with tolerance) instead of exact WKT coordinate matching.
This makes the tests stable across CPU architectures (ARM vs x86)
where trigonometric functions can produce slightly different
floating-point results.
Migrate simple geometry functions to use JTS library.

Test updates for behavior differences:
- ST_Boundary returns LINESTRING instead of MULTILINESTRING for simple polygons
- ST_Buffer with infinity returns POLYGON EMPTY instead of MULTIPOLYGON EMPTY
- Minor floating-point precision differences in some calculations
Migrate ST_NumPoints and related accessor functions to JTS.

Test updates for behavior differences:
- ST_NumPoints now counts closing vertices in polygons per OGC standard
- Ring vertex ordering may differ cosmetically (same geometry)
Add JTS-compatible overloads for geometry utility methods to support
incremental migration from ESRI to JTS. The ESRI versions remain for
existing callers until they are converted.
Rewrite stUnion to use JTS UnaryUnionOp instead of ESRI cursors.

Behavior differences:
- Point-on-line union does not insert vertices
- Empty inputs return empty geometry collection instead of null
- Migrate spatial join operator to JTS for intersection and
  containment tests
- Switch GeoFunctions envelope operations to use JTS Envelope
  (deserializeEnvelope, ST_XMin/XMax/YMin/YMax, ST_IsEmpty)
Use Extended Well-Known Binary (EWKB) format for geometry serialization.
EWKB is the standard used by PostGIS and retains the SRID (Spatial
Reference System Identifier) for coordinate system information.
Note: TestEsriTable's expected values file was converted from Trino's
old internal binary format to WKT. This change cannot be separated
into an earlier commit because the old format's deserializer was
deleted in the EWKB commit, and circular Maven dependencies prevent
adding geospatial as a test dependency to trino-hive.
With ESRI removed JTS objects no longer need fully qualified names
Change the internal representation of geometry values to use JTS
Geometry objects directly, avoiding unnecessary serialization cycles
between function calls.
@dain dain force-pushed the user/dain/geo-jts branch from b3494aa to 9bec6d6 Compare January 10, 2026 04:19
@dain dain marked this pull request as ready for review January 10, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector postgresql PostgreSQL connector

Development

Successfully merging this pull request may close these issues.

2 participants