Trivia - Off-by-one errors and Interval notations

Dumping some info about working with intervals with a focus in implementation in Go. Date intervals in particular, and one dimensional integer intervals in general (if you want to be really formal).

We will denote an interval as [a .. b] that represents all the numbers i such that a ≤ i ≤ b. In some cases it is actually neater to work with “half-open” intervals, which are denoted as [a .. b), representing the set of all numbers such that a ≤ i < b. In fact this is how most programming languages denote ‘slicing’ operations - Go and Python in particular. 1

In Go, we will have the following type

type Interval [2]int // {start, end} denoting the interval [start, end)

func (i Interval) Start() int {
	return i[0]
}

func (i Interval) End() int {
	return i[1]
}

Dates are equivalent to integers

Shocking, I know (/s). Every operation on integer intervals can also be applied to date intervals just fine. In actual code, you can either use Go’s time.Add and time.Sub methods and then divide by 24 hours and truncate.

Or you can keep a base-date and for the part of the code where you need to do your interval calculations, work on integers and then do something like resultDate = baseDate.Add(n * OneDayDuration) where n is the integer corresponding to the date you’re interested in. While both approaches work, for Go specifically it’s better to work on integers and convert later to dates. A generic way would be nicer, but Go doesn’t support operator overloading, so even with generics, it will not look that nice.

For completeness, finding overlaps in date-ranges is something that you can do in Postgres using the overlaps operator. For example, finding the number of employees that are on leave for a given range of days from an employee_leaves table

select count(distinct(leave.employee_id))
from employee_leaves leave
where (leave.start_date, leave.end_date) OVERLAPS
       (@params_start_date, @params_end_date)

2

Note that Postgres’ OVERLAPS operator works on closed intervals, so in above query the intervals in question are [leave.start_date .. leave.end_date] and [@params_start_date .. @params_end_date].


Three problems

We will be

  1. Finding if two intervals overlap.
  2. Given a list of occupied intervals within a date-range, find the intervals that are unoccupied.
  3. Given an interval, remove some interval from within it, producing two new intervals.

Finding if two intervals overlap

Finding if two intervals overlap is a “primitive” that programmers have used over and over in every domain imaginable. Given two closed intervals [a .. b] and [c .. d], report if they overlap. It turns out, if we think about the cases when the two intervals don’t overlap, the test becomes simpler. The following are the two cases:

Case 1
Case 2

We observe that only if the endpoint of one interval precedes the startpoint of the other interval, the two intervals are non-overlapping. Testing overlapping is now easy.

func (i1 Interval) DoesNotOverlap(i2 Interval) bool {
    return (i1.Start() >= i2.End() || i2.Start() >= i1.End()
}

func (i1 Interval) Overlaps(i2 Interval) bool {
	return !i1.DoesNotOverlap(i2)
}

Finding maximally free intervals

Here’s the scenario. You have a range of dates, say the 1 Jan to 31 Jan. You want to allocate some kind of campaigns that last for a few days in this month. But there are already other campaigns possibly allocated inside this month. Dates which are already have allocated/marked/whatever - we call them occupied dates and these form occupied intervals. Note that these pre-existing occupied intervals can be overlapping between themselves. We need to find all the unoccupied/free intervals within this month. Let’s draw a picture.

Example

The above scenario has 13 days, and 3 occupied intervals I1, I2 and I3. This results in 4 maximal free intervals of lengths 1, 1, 3 and 2. It should be somewhat obvious why we call these maximally free. The third free interval in above picture has length 3, but it can itself be broken into multiple smaller free intervals, but we want the largest free intervals.

Here’s a sketch of an algorithm. Keep a variable that represents the potential endpoint of the currently-being-built free interval, as you iterate through the entire date range given. At each day i, if i is occupied, add the currently-being-built-free interval to your list of maximal free intervals but only if the currently-being-built interval is non-empty.

The “if i is occupied” check above can be done in a simple loop, iterating over each of the given intervals, although a more sophisticated data structure like an interval tree can also be used. But it makes sense only if there are a lot of occupied intervals to check against - on the order of at least thousand. Otherwise, brute force should work fine. Here’s an implementation of this algorithm.

Removing sub-intervals

Given two intervals [a .. b] and [c .. d], we want to remove the overlapped part from the first interval, and create a new interval. There are a couple of cases, and we deal with them each.


  1. The reason is if you denote a range of elements in a list as [a:b] (corresponding to the half open interval [a .. b) in math notation, the length of that sub-range is simply b - a. If it instead represented a closed interval, that is the index b was included in the sub-range, the number of elements would be b - a + 1. This pesky 1 rears up its ugly head in so many places that it’s resulted in its own class of errors↩︎

  2. The @ values in the query comes from dynamic values as processed by an SQL driver/ORM - I’m using GORM’s conventions ↩︎