Trivia - Off-by-one errors and Interval notations
Dumping some info about working with intervals with a focus in implementation in Go. Date intervals in particular, and one dimensional integer intervals in general (if you want to be really formal).
We will denote an interval as [a .. b]
that represents all the numbers i
such that a ≤ i ≤ b
. In some cases it is actually neater to work with
“half-open” intervals, which are denoted as [a .. b)
, representing the set of
all numbers such that a ≤ i < b
. In fact this is how most programming
languages denote ‘slicing’ operations - Go and Python in particular.
1
In Go, we will have the following type
type Interval [2]int // {start, end} denoting the interval [start, end)
func (i Interval) Start() int {
return i[0]
}
func (i Interval) End() int {
return i[1]
}
Dates are equivalent to integers
Shocking, I know (/s). Every operation on integer intervals can also be
applied to date intervals just fine. In actual code, you can either use Go’s
time.Add
and time.Sub
methods and then divide by 24
hours and truncate.
Or you can keep a base-date and for the part of the code where you need to do
your interval calculations, work on integers and then do something like
resultDate = baseDate.Add(n * OneDayDuration)
where n
is the integer
corresponding to the date you’re interested in. While both approaches work, for
Go specifically it’s better to work on integers and convert later to dates. A
generic way would be nicer, but Go doesn’t support operator overloading, so even
with generics, it will not look that nice.
For completeness, finding overlaps in date-ranges is something that you can do
in Postgres using the overlaps
operator. For example, finding the number of
employees that are on leave for a given range of days from an employee_leaves
table
select count(distinct(leave.employee_id))
from employee_leaves leave
where (leave.start_date, leave.end_date) OVERLAPS
(@params_start_date, @params_end_date)
Note that Postgres’ OVERLAPS
operator works on closed intervals, so in above
query the intervals in question are [leave.start_date .. leave.end_date]
and
[@params_start_date .. @params_end_date]
.
Three problems
We will be
- Finding if two intervals overlap.
- Given a list of occupied intervals within a date-range, find the intervals that are unoccupied.
- Given an interval, remove some interval from within it, producing two new intervals.
Finding if two intervals overlap
Finding if two intervals overlap is a “primitive” that programmers have used
over and over in every domain imaginable. Given two closed intervals [a .. b]
and [c .. d]
, report if they overlap. It turns out, if we think about the
cases when the two intervals don’t overlap, the test becomes simpler. The following are the two cases:
We observe that only if the endpoint of one interval precedes the startpoint of the other interval, the two intervals are non-overlapping. Testing overlapping is now easy.
func (i1 Interval) DoesNotOverlap(i2 Interval) bool {
return (i1.Start() >= i2.End() || i2.Start() >= i1.End()
}
func (i1 Interval) Overlaps(i2 Interval) bool {
return !i1.DoesNotOverlap(i2)
}
Finding maximally free intervals
Here’s the scenario. You have a range of dates, say the 1 Jan to 31 Jan. You want to allocate some kind of campaigns that last for a few days in this month. But there are already other campaigns possibly allocated inside this month. Dates which are already have allocated/marked/whatever - we call them occupied dates and these form occupied intervals. Note that these pre-existing occupied intervals can be overlapping between themselves. We need to find all the unoccupied/free intervals within this month. Let’s draw a picture.
The above scenario has 13 days, and 3 occupied intervals I1
, I2
and I3
.
This results in 4 maximal free intervals of lengths 1, 1, 3 and 2. It should be
somewhat obvious why we call these maximally free. The third free interval in
above picture has length 3, but it can itself be broken into multiple smaller
free intervals, but we want the largest free intervals.
Here’s a sketch of an algorithm. Keep a variable that represents the potential
endpoint of the currently-being-built free interval, as you iterate through the
entire date range given. At each day i
, if i
is occupied, add the
currently-being-built-free interval to your list of maximal free intervals but
only if the currently-being-built interval is non-empty.
The “if i
is occupied” check above can be done in a simple loop, iterating
over each of the given intervals, although a more sophisticated data structure
like an interval tree can also be used. But it makes sense only if there are a
lot of occupied intervals to check against - on the order of at least thousand.
Otherwise, brute force should work fine. Here’s an implementation of this
algorithm.
Removing sub-intervals
Given two intervals [a .. b]
and [c .. d]
, we want to remove the overlapped
part from the first interval, and create a new interval. There are a couple of
cases, and we deal with them each.
-
The reason is if you denote a range of elements in a list as
[a:b]
(corresponding to the half open interval[a .. b)
in math notation, the length of that sub-range is simplyb - a
. If it instead represented a closed interval, that is the indexb
was included in the sub-range, the number of elements would beb - a + 1
. This pesky1
rears up its ugly head in so many places that it’s resulted in its own class of errors. ↩︎ -
The
@
values in the query comes from dynamic values as processed by an SQL driver/ORM - I’m using GORM’s conventions ↩︎