Conceptual Model

The full specification of the TimeToAlign! cross-domain timeline model

This page presents the complete conceptual model underlying TimeToAlign!. The accompanying TISMIR manuscript introduces these concepts through five worked examples; this page serves as the authoritative reference for formal definitions and visual conventions.

The model is organised in six layers, each building on the previous:

  1. Foundations: Domains, timelines, coordinates, instants, events
  2. Flow control: Contiguity, breaks, jumps
  3. Conversion maps: Coordinate transformations
  4. Nested timelines: Children, segments, timestamps
  5. Regions and derivative timelines: Named parts, traversals, warps
  6. Alignment: MatchClaims, AlignmentAnchors, MatchGraph, MatchLine, WarpMap, AlignmentBundle

Foundations

The timeline

A is a positive coordinate axis minimally defined by two properties:

  • an origin (the zero coordinate), and
  • a measuring unit that determines the type of timeline.

The measuring unit places each timeline in one of three temporal domains (physical, logical, or graphical), each of which admits both continuous and discrete representations. This yields six fundamental timeline types:

Figure 1: The six timeline types. Each row corresponds to a temporal domain; each column to a number type (continuous or discrete). The arrow convention is used throughout: timelines are drawn as arrows progressing from left to right.
Table 1: The six timeline types and their typical measuring units.
Domain Modality Continuous Discrete
Physical Hearing seconds, milliseconds, minutes samples
Logical Conceptualising quarter notes, beats ticks, divisions
Graphical Seeing metres, centimetres, points pixels

The physical domain captures acoustic time as experienced through hearing. The logical domain captures symbolic musical time—the temporal structure encoded in a score, independent of any particular performance or visual rendering. The graphical domain captures spatial representations of time: images, pages, and pixel coordinates.

A key observation: notation is both graphical and logical. A printed score is, in effect, an alignment between a graphical timeline (pixel coordinates on the page) and a logical timeline (quarter notes, beats, measures). This duality is central to the model.

NoteOn “temporal” in the graphical domain

Images are not themselves temporal objects. The graphical domain is called “temporal” because graphical artefacts encode time spatially—a spectromorphological analysis maps seconds to pixels, a score maps beats to horizontal position. The timeline models this spatial encoding of time, not a time dimension intrinsic to the image.

Video and animated scores straddle the physical and graphical domains: they are physical renderings (at a frame rate, in seconds) of graphical content.

Coordinates and instants

Definition: Coordinate

A coordinate represents a time-point in terms of its distance from the timeline’s origin as a positive real number associated with the timeline’s measuring unit.

A coordinate is the model’s most basic quantity: a scalar value paired with a unit (e.g., 3.5 seconds, 480 ticks, 127 pixels). In the implementation, coordinates additionally carry a number type (integer, float, or fraction) to prevent silent rounding errors.

Definition: Instant

An instant associates a coordinate (a “time-point”) with a signification such as “start of event e.”

Everything a timeline accommodates is associated with a coordinate through an Instant. Two Instants sharing an identical coordinate are synchronous but not identical—the instant at which a note ends is ontologically separate from the instant at which the next note begins, even when both share the same coordinate.

Visualisation: Instant

An Instant is represented as a wedge pointing at the timeline, labelled with a coordinate value.

Two principal sub-types are distinguished:

  • StartInstant: marks the beginning of a .
  • EndInstant: marks the end of a TimeInterval.

Together, a StartInstant and an EndInstant define a TimeInterval: a half-open interval [s, e) that is left-inclusive and right-exclusive. This convention ensures that consecutive intervals (e.g., successive quarter notes) tile the timeline without gaps or overlaps: the first quarter note occupies [0, 1), the second [1, 2), and so on.

Constraint: The EndInstant’s coordinate must be greater than or equal to the corresponding StartInstant’s coordinate.

Events

Definition: Event

We call anything associated with a timeline via an Instant or a TimeInterval an event.

This is a deliberately broad category. The model is agnostic to what events represent—they may be notes, rests, annotations, audio segments, bounding boxes, or any other entity that possesses a temporal location. Two principal sub-types:

  • InstantEvent: defined by a single Instant (zero duration).
  • TimeIntervalEvent: defined by a StartInstant and an EndInstant (non-zero duration).

Events may carry arbitrary properties (pitch, dynamics, text labels, file paths) that are relevant for querying and matching but are not part of the temporal model itself.

Figure 2: Events on a timeline. InstantEvents are marked by a single wedge; TimeIntervalEvents span a region between two wedges (start and end).

Basic operations

A timeline supports four basic operations:

  1. Addition: inserting an Instant (and thereby its associated event).
  2. Removal: deleting an Instant.
  3. Lookup: retrieving all Instants at a given coordinate.
  4. Query: retrieving all TimeIntervals spanning a given coordinate.

The length of a timeline is the distance between its origin and its last coordinate with at least one Instant. A timeline may be locked, preventing the addition of Instants beyond the last existing time-point.

Synchrony

The model distinguishes two forms of synchrony:

  • Strict synchrony: Instants sharing identical coordinates on the same timeline.
  • Pragmatic synchrony: Instants binned together on the basis of a threshold (quantisation). Pragmatic synchrony is not yet implemented in the library but is planned as a configurable parameter.

Five default propositions govern Instants on a timeline:

  1. Instants sharing the same coordinate are synchronous.
  2. Temporally overlapping parts of TimeIntervals are strictly synchronous.
  3. A later coordinate implies later occurrence in that domain (no cross-domain implication is entailed).
  4. Every StartInstant necessitates a corresponding EndInstant, and vice versa.
  5. An EndInstant’s coordinate cannot be smaller than the corresponding StartInstant’s.

Flow Control

Contiguity

Definition: Contiguous TimeInterval

A TimeInterval [s, e) is contiguous in the sense that it spans all coordinates between its Start- and EndInstant monotonically. A TimeInterval is said to span a coordinate c if s < c < e.

Definition: Contiguous with

We call contiguous with TimeInterval [s, e) all Instants synchronous with e. By extension, any event located or starting at the end of a TimeIntervalEvent is called contiguous with it.

Contiguity is the default: a section of music flows into the next. Two phenomena override this default.

Breaks

Definition: Break

A break is a control event that voids the contiguity at the Instant where it is located.

Events that would normally be contiguous at a break’s coordinate become discontiguous. Two constraints follow:

  • A TimeInterval cannot be added if it would span a break’s coordinate.
  • A break cannot be placed at a coordinate already spanned by a TimeInterval.

Breaks encode phenomena such as the boundary between a prima volta and a seconda volta ending, or the boundary at a coda sign where the default sequential flow is interrupted.

Visualisation: Break

Breaks are represented as red lines orthogonal to the timeline axis.

Jumps

Definition: Jump

A jump is a control event defined by two Instants: a JumpFrom and a JumpTo Instant. When active, it makes any event located or starting at JumpTo contiguous with any event ending at JumpFrom.

Jumps encode repeat signs, dal segno al coda, da capo al fine, and analogous structures. Additional event properties define conditions under which a control event is active (e.g., a fine is effective only after a da capo al fine; prima and seconda volta brackets are conditionally active depending on the traversal count).

Visualisation: Jump

Jumps are represented as dotted arrows from the JumpFrom Instant to the JumpTo Instant. The curvature of the arrow reflects the distance between the two coordinates.

Together, breaks and jumps form the flow-control apparatus of the model. They determine how a timeline is traversed—which sections are played in what order—without altering the underlying coordinate system. The traversal itself is captured by a (see Derivative timelines).

Conversion Maps

Definition: ConversionMap (C-map)

A ConversionMap is a typed function attached to a timeline that maps each coordinate to exactly one value. If the function is bijective, we can derive an InverseMap; also permits inverse conversion back to the timeline’s own unit.

Conversion maps are at the heart of the model. They connect coordinate systems, convert between units, and enable the transfer of information across timelines.

Visualisation: ConversionMap

Conversion maps are visualised as double lines adopting colour and pattern from the type of timeline associated with the measuring unit that they output (see Figure 3). Maps outputting values other than coordinates (e.g., string labels) use distinctly different colours or patterns.

Figure 3: Conversion maps map coordinates to other values, typically coordinates in a different unit. The double-line convention distinguishes C-maps from timelines.

Output types

The output of a C-map can be:

  • A coordinate in the same or a different unit (e.g., pixels to seconds, ticks to quarter notes). This is the most common case.
  • A specifier such as a measure number, a filename, or a label.
  • An auxiliary constant (e.g., a fixed y-coordinate for generating a horizontal line in an image).

When the output is a coordinate, the C-map implicitly defines a derived timeline with variant coordinates. This is how a single DiscreteGraphicalTimeline (pixels) can simultaneously express seconds, quarter notes, and (x, y) pairs.

Bounded and unbounded maps

A C-map can be:

  • Unbounded: applies globally across the entire timeline.
  • Bounded: defines one or several regions of vigour (s) within which it is valid. Outside these regions, the map returns no result.

Map families

The implementation organises C-maps into families:

Table 2: Families of ConversionMaps in the timetoalign library.
Family Description Example
Affine (ScalarMultiplicationMap) y = ax + b pixels to seconds
Lookup (TableMap) Explicit key-value mapping tick to tempo
Interpolation Linear interpolation between anchor points warp path
Constant Returns the same value for any input y-coordinate of a staff line
Straight line Generates (x, y) pairs for a line horizontal system in an image
Discretisation Rounds continuous to discrete seconds to samples
Periodic (RotationMap) Wraps coordinates cyclically repeating ostinato
Metric-aware (FloorMap, MetricMap) Beat and measure structure BeatGrid

MultiMaps: composing conversion maps

Three composition strategies produce MultiMaps:

CombinationMap
Yields outputs from multiple C-maps simultaneously. For instance, combining an x-map and a y-map produces (x, y) coordinate pairs.
ChainMap
Applies C-maps in sequence: the output of one becomes the input of the next (e.g., ticks -> beats -> seconds). A ChainMap defines a conversion path and is a key building block for coordinate transfer.
ConcatenationMap
Combines bounded C-maps so that each coordinate region maps to one of them. For instance, a score with five systems has a ConcatenationMap of five StraightLineMaps, each valid within its system’s pixel range.

Nested Timelines

A timeline can accommodate not only events but also other timelines.

Children

Definition: Child

A timeline is called a Child when it is recursively nested in a parent timeline.

A timeline can accommodate other timelines (Children) provided they use the same measuring unit. Constraints mirror those for TimeIntervalEvents:

  • A Child cannot span an existing in the parent (unless explicitly adopted).
  • A Child may only extend the parent’s length when the parent is not locked.
  • Once added, a Child’s length is locked to prevent side effects.
  • Only the top-level (“root”) timeline can be extended.

Children introduce relative coordinate systems within a parent. A Child’s local origin is mapped to a specific coordinate of the parent; its local coordinates are offsets from that origin. This is essential for modelling multi-system score pages, where each system has its own pixel range but is part of a single graphical timeline.

Visualisation: Child

A Child is visualised as a timeline whose EndInstant is marked by a hollow diamond (in contrast to the filled arrowhead of a root timeline).

Children can overlap, leave gaps between them, carry their own C-maps, and nest recursively to arbitrary depth.

Figure 4: Nested timelines. The parent timeline contains two children (segments), each with its own local coordinate system.

Segments and SegmentLines

Definition: Segment and SegmentLine

When all Children of the same parent timeline (“siblings”) are contiguous with each other, we call them Segments and the parent a SegmentLine.

The key advantage of a SegmentLine is that contiguous siblings allow cumulative summing of segment lengths and their C-maps (provided they convert to the same unit relative to the segment’s origin). This enables efficient coordinate conversion across the entire structure.

Four operations are associated with Children:

  • Addition: adding a timeline as a Child, including all its events and C-maps.
  • Partitioning: creating a new Child from a parent timeline and a TimeInterval.
  • Segmentation (special case of partitioning): creating k Segments from k+1 split coordinates, turning the parent into a SegmentLine.
  • Concatenation: adding a Child at the parent’s EndInstant (extending the parent).
Table 3: Comparison of Regions, Children, and Segments.
Property Region Child Segment
Is a timeline (holds events) No Yes Yes
Siblings must be contiguous No No Yes
Can span breaks Yes No No

TimeStamps

Definition: TimeStamp

A TimeStamp is a set of values reflecting the coordinate of a root timeline, the synchronous coordinates of all Children, and the conversion results of all ConversionMaps associated respectively.

A TimeStamp is a cross-section through a timeline hierarchy at a given coordinate. It answers the question: “At coordinate c of the root timeline, what are the local coordinates in each Child, and what do all C-maps produce?”

By analogy, a TimeIntervalStamp combines a start TimeStamp and an end TimeStamp, representing the cross-section of an interval through the hierarchy.

Visualisation: TimeStamp

TimeStamps are visualised as a vertical cross-section through a timeline that intersects all Children and ConversionMaps at their synchronous points.

Figure 5: A timeline’s internal data structures. The instants table records all Instants with their coordinates, IDs, and types. The intervals table is derived from the instants table and supports interval queries.

Regions and Derivative Timelines

Regions

A Region is a named part of a timeline defined by a . Regions are useful for referring to parts of a timeline by name (e.g., “Chorus”, “Verse”, “Development”) or by generic labels (e.g., alphabetical letters for representing traversal paths).

Regions are not timelines: they do not accommodate events or C-maps. They can, however, be used as the basis for partitioning a timeline into Children—creating a Child whose boundaries correspond to the Region’s TimeInterval.

Derivative Timelines

Three kinds of derived timelines play a central role in alignment:

FlowMap
A sequence of s (often corresponding to previously defined s) specifying a particular path through the timeline’s control structure (repeats, jumps). A is computed from flow-control events (s and s) or assembled manually. Its intervals can be partitioned and concatenated, yielding a new reflecting the traversal—i.e., unfolding the score into performance order. For example, a score with a repeat and a dal segno al coda has a FlowMap like [A, B, A, B, C, D] where some regions appear more than once. The resulting SegmentLine is a new timeline in performance order.
Converted timeline
A timeline whose coordinates have been transformed to another unit via a or , potentially after applying a .
WarpMap
A derived timeline where coordinates have been re-adjusted to align with those of another timeline on the basis of s. See Alignment for the full definition of WarpMap and the pipeline that produces it.

Alignment

The alignment layer is where the model’s cross-domain capability comes together.

MatchClaim

Definition: MatchClaim

A MatchClaim is a claim of equivalence between events on different timelines, with provenance metadata (agent, method, certainty). A MatchClaim may be synchronous (producing s) or conceptual (structural equivalence without temporal commitment). It may include special sentinel values () explicitly encoding the absence of a corresponding event on the other timeline.

A MatchClaim takes the form of named references (IDs) of claimed-equivalent events, combined with metadata on genesis:

  • Agent: who or what produced the claim (human annotator, DTW algorithm, neural model, …).
  • Decision criteria: what evidence was used (temporal similarity, harmonic content, expert judgement, …).
  • Certainty level: how confident the agent is in the claim.

MatchClaims grow incrementally: an initial claim between events A and B can be extended by adding B-C, creating a match path A -> B -> C that connects three timelines. The model explicitly matches events (not coordinates) to allow richer metadata about the alignment procedure.

Two levels of matching are distinguished:

  • Synchronous matching: the matched events occupy corresponding temporal positions (e.g., “this beat in the score corresponds to this moment in the recording”).
  • Conceptual matching: the matched events are semantically equivalent but may not share a temporal correspondence (e.g., “both versions have a Bridge section, but they differ in length and internal structure”).

AlignmentAnchors

Definition: AlignmentAnchor

An AlignmentAnchor is a pure coordinate pair associating one coordinate on timeline A with one coordinate on timeline B. It is a neutral record with no claim semantics—provenance and interpretation are carried by the enclosing . Only synchronous MatchClaims produce AlignmentAnchors.

From a that associates s, two anchors are derived: a StartAnchor and an EndAnchor. Based on the units present in the aligned timelines, s can be constructed.

Visualisation: AlignmentAnchor

AlignmentAnchors are visualised as connecting lines between TimeStamps, using the same or a similar visual style as the Matches they derive from.

MatchClaim cases

A MatchClaim always knows which pair of timelines it connects. It contains 0–2 s (only if synchronous), synchrony and explicitness flags, and optional metadata (agent, decision criteria, certainty). Four cases are distinguished:

  1. Event-to-event (case a): two timed things on different timelines correspond—the core alignment operation.
  2. Projection (case b): an event is projected onto a timeline with no matching event—useful for transferring annotations.
  3. NOMATCH (case c): an event has no equivalent on the other timeline—a positive assertion of absence.
  4. Implicit (case d): generated automatically by group extension—e.g., when a match between two timelines implies coordinates on sibling timelines within the same .

MatchGraph and MatchStamp

A MatchGraph is an on-demand analytical structure built from MatchClaims. Nodes are (timeline_id, coordinate) tuples; edges derive from the s of synchronous claims. Each connected component yields a MatchStamp—a synchronised instant across multiple timelines. A MatchGraph can be extended with implicit claims based on membership. In the Hendrix example, each Match box M1–M15 is a MatchGraph, not a single MatchClaim.

Definition: MatchStamp

A MatchStamp is the union of s from groups synchronously connected by a . It contains one coordinate per participating timeline and, with conversion_maps=True (the default), the results of all s associated with those timelines.

A MatchStamp from a single contains the TimeStamps for the claim’s . A MatchStamp from a MatchGraph contains all TimeStamps that the graph connects through its connected components. Like a , it can be expanded with within-group coordinates (delegating to ) and C-map conversions.

The MatchLine

The MatchLine is an ordered sequence of s derived from s, sorted by coordinate on a source timeline. It serves as input for generation.

WarpMap

Definition: WarpMap

A WarpMap is an interpolation-based coordinate mapping derived from a ; it enables non-linear coordinate transfer between s.

A WarpMap is generated from a MatchLine by extracting (source_coord, target_coord) pairs and building an interpolation table. If the source and target timelines have different units, the warped copy is in the target’s unit and of the appropriate timeline type (e.g., a ContinuousLogicalTimeline warped to a ContinuousPhysicalTimeline). A WarpMap requires at least two matching coordinate pairs for interpolation. It is the mechanism by which non-linear temporal correspondences—such as those between a rubato performance and a metronomic score—are captured.

AlignmentBundle

Definition: AlignmentBundle

The AlignmentBundle is a collection of s with inter-group connections; it provides coordinate transfer via offset arithmetic, interpolation, and s.

The AlignmentBundle provides coordinate transfer at two levels:

  • Within a group: delegates to TimelineGroup.get_timestamp_at(), which uses pairwise interpolation.
  • Across groups: constructs a from the cross-group claims, generates s on demand, and uses those WarpMaps for coordinate transfer.

The canonical user-facing query is bundle.get_matchstamp_at(coord, tl_id), which returns a spanning all groups connected to the queried timeline.

The full alignment object hierarchy, from atomic to composite, is:

AlignmentAnchor      (pure coordinate pair, no semantics)
    ↓
MatchClaim           (semantic claim about two timelines)
    ↓
MatchGraph           (on-demand graph from multiple claims)
    ↓
MatchStamp           (synchronised instant across N timelines)
    ↓
MatchLine            (ordered sequence of MatchStamps)
    ↓
WarpMap              (materialised timeline copy)

Coordinate transfer

The model implements three tiers of coordinate transfer:

  1. Offset arithmetic for parent–child relationships: a child’s coordinate is the parent’s coordinate minus the child’s offset. Exact and lossless.
  2. Interpolation within s: timelines of potentially different units (e.g., seconds and samples) are connected via pairwise linear interpolation.
  3. WarpMap for cross-group alignment: derived from a , a WarpMap materialises a complete copy of a source timeline warped to a target timeline’s coordinates.

Two timelines are commensurable if connected by a chain of C-maps, membership in the same , or cross-group s. is the precondition for coordinate transfer.

This layered architecture is what enables TimeToAlign! to propagate annotations, transfer coordinates, and answer queries across arbitrarily many representations of the same musical content—the central promise of the model.

Five Demonstration Scenarios

The TISMIR manuscript demonstrates the model through five worked examples of increasing complexity. Each is accompanied by real-world data in the specimen collection and (where applicable) a tutorial notebook.

Table 4: The five demonstration scenarios from the TISMIR manuscript.
# Scenario Key concepts introduced Specimen
1 SUPRA: Aligning a historical piano roll with a digital score Timeline (six types), C-map, Child, TimelineGroup, commensurability, TimeStamp specimens/supra_rolls/
2 Chorissimo: Flow control in a school song with repeats Segment, SegmentLine, Region, Break, Jump, FlowMap, RotationMap, unfolding specimens/audiolabs_omr/Chorissimo_Blue081/
3 Thoresen: Transferring annotations between graphical analyses MatchClaim, InverseMap, y-coordinate warping via custom C-map specimens/thoresen/
4 Beethoven: Multimodal alignment across 23 timelines FlowMap (unfolding at scale), AlignmentBundle, AlignmentAnchor, WarpMap, MatchLine specimens/beethoven_op18-4_iv/
5 Hendrix: Studying the genesis of a rock song across three versions MatchClaim (conceptual/synchronous), MatchGraph, NOMATCH, MatchStamp, certainty metadata specimens/hendrix/

For details, see the manuscript and the tutorial notebooks (links to be added when the tutorials are integrated into this site).