← Back to viewer Solid Rock Hebrew Bible with Cantillation
Mikra according to the Masora accents mapped onto the Solid Rock Hebrew Bible
What are cantillation marks?
Cantillation marks (Hebrew: te'amim, טעמים) are
the accent signs in the Hebrew Bible that indicate how each word is chanted during synagogue
reading. They also function as a system of punctuation, marking phrase divisions and clause
boundaries. Each word in the Hebrew Bible traditionally carries one cantillation accent.
In Unicode, cantillation marks occupy the range U+0591–U+05AF and appear as small marks
above or below the consonant letters, distinct from the vowel points (nikkud,
U+05B0–U+05BD).
The two source editions
Solid Rock Hebrew Bible (SRHB) is a TEI XML critical edition by Joey McCollum with 2,500+ textual
adjustments. It includes consonants and vowel points but is missing cantillation marks.
Mikra according to the Masora (MapM) is a Hebrew Bible text available on Hebrew Wikisource that includes full cantillation marks along with consonants and vowel points.
Both editions draw on important manuscript traditions but each has its own editorial methodology
and textual decisions. This project uses MapM as the source for cantillation accents and maps
them onto the Solid Rock text word by word.
The transposition algorithm
The process runs in four stages:
1. Import Both texts are parsed and loaded into a SQLite database. Solid Rock is parsed from TEI XML files
(via the quick-xml crate). MapM is parsed from a JSON export. Each word is stored with
its book, chapter, verse, and position number.
2. Normalize for comparison Before comparing words, both texts undergo Unicode normalization:
- Strip all cantillation marks (U+0591–U+05AF) and punctuation
- Strip meteg (U+05BD) — MapM includes it, Solid Rock does not
- Normalize kamatz katan (U+05C7) to regular kamatz (U+05B8)
- Split MapM words joined by maqaf (U+05BE) into separate tokens
- Filter out standalone paseq (U+05C0) tokens from MapM
- Apply NFC normalization to resolve combining mark order differences
3. Match and transpose For each verse, Solid Rock words and MapM words are aligned by word position (1st word ↔ 1st
word, 2nd ↔ 2nd, etc.). For each pair:
- If the normalized forms match → Matched (confidence 1.0). Cantillation marks from the MapM word are applied to the Solid Rock word.
- If the normalized forms differ → Mismatch (confidence 0.5). The original Solid Rock word is displayed without cantillation, since
applying accents from a non-matching word would be misleading. These words await resolution
from proper manuscript sources.
- If no MapM word exists at that position → Not in MapM (confidence 0.0). The Solid Rock word is kept without cantillation.
4. Apply accents to matched words For words that match, the accent application works character by character:
- Build a "skeleton" of the MapM word: each non-cantillation character paired with any
cantillation marks that follow it.
- Walk through the Solid Rock word and MapM skeleton in parallel.
- When a Solid Rock character matches the corresponding MapM skeleton character, insert the
associated cantillation marks after it.
- When characters don't match, advance both pointers (best-effort alignment).
Results
| Status | Words | Percentage |
|---|
| Matched — cantillation applied | 283,868 | 92.8% |
| Mismatch — text differs, shown without
cantillation | 21,334 | 7.0% |
| Not in MapM | 636 | 0.2% |
| Total | 305,838 | |
Why do mismatches occur?
The two editions sometimes have different words at the same position in a verse. Common causes
include:
- Different word counts. One edition may split or join words differently (e.g.
with maqaf), causing all subsequent words in the verse to be offset by one or more
positions.
- Textual variants. The editions sometimes preserve different readings of the same
passage.
- Orthographic differences. Plene vs. defective spelling, or different vowel traditions
for the same word.
Improving mismatch resolution — for instance through fuzzy matching or context-aware word
alignment — is an active area of development.
How to read the viewer
Click any word in the viewer to see a tooltip with:
- Status and confidence score
- SR Original — the word as it appears in Solid Rock (without cantillation)
- Result — for matched words, the cantillated result; for mismatches, the
original SR word
- Notes — for mismatches, shows what both SR and MapM texts looked like at
that position
Source code and data
Updating the Solid Rock text
This project tracks the Solid Rock Hebrew Bible as a Git submodule pointing to the upstream repository at github.com/jjmccollum/solid-rock-hb.
When updates are pushed to that repository, we incorporate them by:
- Pulling the latest submodule:
git submodule update --remote solid-rock-hb - Re-importing:
cargo run -- import-solid-rock-chirho - Re-running transposition:
cargo run -- transpose-chirho - Re-exporting:
cargo run -- export-chirho - Redeploying the site and PDF
Any corrections to the Solid Rock text (such as typo fixes or future edition updates) are picked
up automatically through this pipeline. The entire process from import to export takes under a
minute.