Pagination

Sharding-Sphere supports the paging query of MySQL, PostgreSQL and Oracle. For SQLServer, due to its complex paging query, only partial queries can be executed.

Pagination performance

Performance bottleneck

The SQLs with excessive paging offset can result in the poor performance of the database. Take MySQL for example:

SELECT * FROM t_order ORDER BY id LIMIT 1000000, 10

This SQL makes MySQL to retrieve 10 records after skipping over 1000000 records, if no suitable index can be used. In the case of Sharding (assuming two databases for sharding), in order to ensure the result are correct, the SQL will be rewritten as:

SELECT * FROM t_order ORDER BY id LIMIT 0, 1000010

The records before the offset are taken out, and only the last 10 of all the records ordered are obtained. If the database itself is stressed, the SQL will further exacerbate the performance bottleneck, for the original SQL only needs to send 10 records to the client, but the rewritten SQL will send 1000010*2 to the client.

Optimization in Sharding-Sphere

Two of aspects is optimized in Sharding-Sphere.

First, uses streaming + merge sort to avoid excessive memory consumption. Sharding-Sphere uses the rewrote SQLs for the query, and necessarily takes up additional bandwidth, but does not cause the memory a sharp rise. Most of people think that Sharding-Sphere will load 1000010 * 2 records into memory, which will take up a lot of memory and cause a memory leak. However, because the records of each result set are ordered, Sharding-Sphere only compares the current record of each sharding at a time, and only saves the cursor of the current record in memory. The time complexity of merge sort is only O(n), and the loss of performance is very small.

Second, Sharding-Sphere also optimizes queries that are only routed to a single slice. The query routing to a single slice can ensure the result are correct without the SQL rewriting. Therefore, Sharding-Sphere will save the bandwidth by means of not rewriting SQL.

Better solution of pagination

Since the LIMIT queries the data not using the index, a better solution is that you operate paging by using ID, if the ID is sequential:

SELECT * FROM t_order WHERE id > 100000 AND id <= 100010 ORDER BY id

Or you can query the next page of data by recording the ID of the last record of the last query:

SELECT * FROM t_order WHERE id > 100000 LIMIT 10

Paging for both Oracle and SQLServer need to be processed through subquery, and Sharding-Sphere supports paging-related subquery。

  • Oracle

Sharding-Sphere supports to use rownum for paging:

SELECT * FROM (SELECT row_.*, rownum rownum_ FROM (SELECT o.order_id as order_id FROM t_order o JOIN t_order_item i ON o.order_id = i.order_id) row_ WHERE rownum <= ?) WHERE rownum > ?

At present, it does not supports rownum + BETWEEN for paging.

  • SQLServer

Sharding-Sphere supports TOP + ROW_NUMBER() OVER for paging:

SELECT * FROM (SELECT TOP (?) ROW_NUMBER() OVER (ORDER BY o.order_id DESC) AS rownum, * FROM t_order o) AS temp WHERE temp.rownum > ? ORDER BY temp.order_id

Sharding-Sphere also supports OFFSET FETCH in SQLServer 2012 or above for paging:

SELECT * FROM t_order o ORDER BY id OFFSET ? ROW FETCH NEXT ? ROWS ONLY

It does not support WITH xxx AS (SELECT …) or two TOP + subquery for paging. Because paging statements generated by Hibernate in SQLServer use the WITH statement, Sharding-Sphere currently does not support SQLServer paging based on Hibernate.

  • MySQL, PostgreSQL

Both MySQL and PostgreSQL support LIMIT for paging, and subquery is not needed:

SELECT * FROM t_order o ORDER BY id LIMIT ? OFFSET ?