← Projects

R-tree Spatial Queries

A Java implementation of an R*-tree spatial index for efficient querying of OpenStreetMap (OSM) data, supporting range queries, k-nearest neighbor searches, and skyline queries with comprehensive performance analysis.

github.com/pompos02/R-treeQueries

Project Overview

This project implements an R*-tree spatial index in Java for efficient spatial querying of OpenStreetMap data (Malta). The system processes OSM XML files through a three-stage pipeline, builds spatial indexes, and supports multiple query types with comprehensive performance benchmarking against sequential scans.

  • Language: Java 8+ with custom spatial data structures
  • Data Source: OpenStreetMap (.osm) XML files
  • Core Algorithm: R*-tree spatial indexing for 2D point data
  • Query Types: Range queries, k-nearest neighbors, skyline queries
  • Performance: Comprehensive benchmarking with speedup analysis

Data Pipeline

The application follows a structured three-stage pipeline for processing OSM data into an efficient spatial index:

OSM File (.osm) → DataFile (datafile.dat) → IndexFile (indexfile.dat)
Stage 1
OSM Parsing
Stage 2
DataFile Creation
Stage 3
IndexFile Creation

Stage 1: OSM Parsing

  • Input: OpenStreetMap XML file containing spatial data
  • Output: List of Record objects with coordinates and metadata

Stage 2: DataFile Creation

  • Function: Organizes records into 32KB blocks for efficient storage
  • Output: datafile.dat - Binary file containing serialized spatial records

Stage 3: IndexFile Creation

  • Process: R*-tree construction via RStarTree or BulkLoadingRStarTree
  • Function: Builds spatial index structure pointing to datafile blocks
  • Output: indexfile.dat - Binary file containing R*-tree nodes

Architecture Types

1. Standard R*-tree (Incremental Loading)

  • Method: Records inserted one-by-one with dynamic tree restructuring
  • Characteristics:
    • Dynamic insertion with overflow treatment
    • Supports insertions after initial construction

2. Bulk Loading R*-tree

  • Method: Sorts all records and builds tree bottom-up
  • Characteristics:
    • More efficient initial construction
    • Better space utilization

Query Types

Range Query (BoundingBoxRangeQuery)

Finds all points within a rectangular bounding box. Efficient for spatial range selections.

K-Nearest Neighbors (NearestNeighboursQuery)

Finds the k closest points to a given coordinate. Optimized for proximity-based searches.

Skyline Query (SkylineQuery)

Identifies non-dominated points within a bounding box region. Useful for multi-criteria optimization.

Performance Analysis

Comprehensive benchmarking comparing R*-tree performance against sequential scan for range and KNN queries with variable parameters.

Range Query Performance

Total Iterations:
100
Average R* Tree Time:
538.21 ms
Average Sequential Scan Time:
459.65 ms
Average Speedup:
0.85x (sequential faster for large ranges)
Returned Records Range:
0 to 728,398
Range Query Performance Comparison

Range query performance comparison between R*-tree and sequential scan

Range Query Speedup Analysis

Speedup analysis for range queries showing when R*-tree outperforms sequential scan

K-Nearest Neighbors Performance

Total Iterations:
99
Average R* Tree Time:
16.82 ms
Average Sequential Scan Time:
449.21 ms
Average Speedup:
26.71x (R*-tree significantly faster)
Returned Records Range:
100 to 9,900 (k values)
KNN Query Performance Comparison

KNN query performance comparison showing R*-tree efficiency

KNN Query Speedup Analysis

Speedup analysis for KNN queries demonstrating spatial indexing benefits

Key Observations:

  • Range Queries: R*-tree faster for small ranges (few records), sequential scan better for large ranges (many records)
  • KNN Queries: R*-tree consistently outperforms sequential scan with speedup increasing as k grows (up to 26.71x faster)
  • Selective Queries: Spatial indexing excels in selective queries where only a subset of data needs to be examined
← Back to Projects