Rolling working document for bibliography migration, identifier assignment, URL permanence, and editorial operations during the current Omeka staging phase.
Last updated: 2026-04-25 Current platform: https://new.kaanu.org (to be moved to archive.kaanu.org)
Current legacy source: Quartz/Obsidian vault plus WIKINDX where recoverable
Purpose
This handbook is the practical operating document for the Kaanu archivist/editor.
Use it to keep the following stable across migration waves:
- what gets imported
- how records are identified
- what public URLs mean
- how redirects are managed
- when PDFs can be made public
- what the current batch status is
Current operating decisions
- Omeka S is the staging archive and current system of record.
- Zotero is the migration buffer and editorial cleanup layer.
- Quartz remains the legacy source until migration is complete.
- PDFs should be batch-attached through Omeka workflows, not manually item by item.
- Public item URLs must be permanent and independent of Omeka internal IDs.
Permanent identifier rule
Every bibliographic record gets one permanent Kaanu identifier at the moment it is accepted into the migration ledger.
Format
- Prefix:
kb - Numeric body: zero-padded 6 digits
- Examples:
kb000001kb000842
Rule
- The identifier is assigned to the intellectual record, not to a particular file path.
- Once assigned, a
kaanu_idis never reused. - If a record is withdrawn, merged, or suppressed, its
kaanu_idremains reserved. - If two records are merged, one
kaanu_idbecomes canonical and the other becomes a retired identifier that redirects to the canonical record.
Assignment practice
- Assign identifiers sequentially from the migration ledger.
- Do not wait for final publication to assign a
kaanu_id. - The
kaanu_idshould be stored in:- the migration ledger
- Omeka as a persistent identifier field
- any redirect map
- any future export or API feed
Permanent URL rule
Kaanu should use one identifier-based canonical public URL for each item.
Canonical item URL
Pattern:
https://kaanu.org/bib/{kaanu_id}
Examples:
https://kaanu.org/bib/kb000001https://kaanu.org/bib/kb000842
Canonical PDF URL
Pattern:
https://kaanu.org/bib/{kaanu_id}/download
Example:
https://kaanu.org/bib/kb000001/download
Why this rule
- The URL does not depend on title wording.
- The URL does not depend on Quartz or Omeka path structure.
- The URL does not expose implementation details like
/item/847. - The bibliographic record remains stable even if the file becomes private or is replaced.
What must not be canonical
These may exist internally or as redirects, but should not be the long-term public identifier:
- Omeka numeric item URLs such as
/item/10 - title-only slug URLs
- paths with platform names like
/omeka/,/quartz/,/new/ - file extensions like
.htmlor.php - workflow status markers such as
draft,review, orlatest
Redirect rule
Every known legacy bibliography URL should redirect to the canonical Kaanu item URL.
Redirect policy
- Use
301redirects when cutover is final. - Build redirects from the migration ledger, not ad hoc in server config.
- Redirect from old Quartz title pages to
https://kaanu.org/bib/{kaanu_id}. - Do not redirect directly to raw PDF URLs.
If an item is merged
- old
kaanu_idredirects to the surviving canonicalkaanu_id - legacy Quartz URLs for both records redirect to the surviving canonical record
Canonical metadata rule
Each item should eventually carry these core fields:
kaanu_id- canonical item URL
- canonical PDF URL if public
- canonical DOI (Zenodo-minted), held in
bibo:doi - pre-existing publisher DOI(s) for cross-reference, held in
kaanu:otherDoi - legacy source path or URL
- Zotero bridge identifier
- Omeka internal item ID
- PDF rights status
Required ledger columns
The migration ledger should now track the following minimum fields:
kaanu_idcanonical_item_urlcanonical_pdf_urlkaanu_doi(the Zenodo-minted DOI, populated after the Phase 2D-4 mint pass)other_doi(any pre-existing publisher DOI, semicolon-separated if more than one)omeka_item_idzotero_bridge_identifierlegacy_path_or_urllegacy_rel_pathlegacy_titleyear_guesssource_typepdf_presentpdf_urlpdf_rights_statuspublic_file_allowedoriginal_source_urlcandidate_zotero_itemimport_statusredirect_neededredirect_target_urlduplicate_groupnotes
The template at migration/templates/migration_ledger_template.csv has been expanded to reflect this.
DOI policy
Every Kaanu record receives a Zenodo-minted DOI as its canonical DOI. Pre-existing publisher DOIs (CrossRef, DataCite-elsewhere) are not used as the canonical Kaanu DOI; they are captured separately for cross-reference.
Two fields, two roles
bibo:doiis the canonical Kaanu DOI, minted by Zenodo on deposit. Single value per record. Populated by the Phase 2D-4 mint pass and on every new accession thereafter.kaanu:otherDoiis multivalued. Holds any pre-existing publisher DOI, in DOI-name form (10.NNNN/...). Empty for records without a prior DOI.
Order of operations
record_other_dois.pyruns first on any record set that has been imported from Zotero or other sources. It pattern-matches DOI strings indcterms:source,dcterms:identifier,dcterms:bibliographicCitation, and any imported Zotero DOI field, normalises them, and writes them tokaanu:otherDoi.mint_zenodo_dois.pythen deposits each record into the Kaanu community on Zenodo and writes the minted DataCite DOI back tobibo:doi.
Zenodo fair-usage caveat
Zenodo has a fair-usage policy for bulk deposit. The relevant red flags for a project of Kaanu’s shape are: dividing a single dataset into many records to circumvent upload limits, uploading very large numbers of records independent of data volume, and uploading content where the main purpose is indexing or archiving or promotion. Kaanu is a curated bibliographic archive at the scale of around 2,000 records. Before the first bulk run, write to Zenodo for an upfront agreement on the use case.
Editor backstop
If Zenodo declines records that already have a publisher DOI, those records keep their existing DOI in bibo:doi and kaanu:otherDoi is left empty for that record. The canonical-DOI rule still holds: every Kaanu record has exactly one bibo:doi.
Display rule
The item page shows the canonical DOI (bibo:doi) on the headline citation line. If kaanu:otherDoi is populated, a “Publisher DOI” line appears below it. The user-visible distinction is “this is Kaanu’s DOI” versus “this DOI was assigned by the original publisher”.
Editorial workflow
For each incoming or migrated record:
- Create or confirm the ledger row.
- Assign a
kaanu_id. - Set the canonical item URL from the
kaanu_id. - Decide whether the file is:
- metadata only
- public PDF allowed
- private/off-platform
- Clean the metadata in Zotero.
- Import into Omeka.
- Confirm the Omeka item is mapped back to the ledger row.
- Run
record_other_dois.pyto capture any pre-existing publisher DOI intokaanu:otherDoi. - Run
mint_zenodo_dois.pyto mint and store the canonical DOI inbibo:doi. - Add redirect coverage for any known legacy URL.
PDF policy
The item page is the permanent object of record.
The PDF is only one representation of that object.
This means:
- if a PDF is public, expose it at
.../download - if a PDF later needs to be withdrawn, keep the item URL alive
- if a better file replaces the original, keep the item URL unchanged
Batch import policy
Do not run the migration as:
- metadata first by one process
- PDFs later by manual cleanup
Instead, aim for one repeatable batch pipeline:
- cleaned metadata CSV
- encoded PDF URL where allowed
- Omeka import in controlled waves
Recommended wave size after pilot validation:
50to150records per run
Search and indexing policy
To avoid indexing problems:
- expose only one canonical public item URL per record
- add canonical tags to public item pages at final domain cutover
- keep staging domains out of indexing where appropriate
- generate sitemap entries from canonical item URLs only
- keep redirects stable and permanent
Rolling notes
2026-04-25
- Adopted Zenodo-for-all DOI strategy: every Kaanu record receives a Zenodo-minted DOI in
bibo:doi - Added
kaanu:otherDoito the Kaanu vocabulary (literal, multivalued) for any pre-existing publisher DOI - Added two scripts to the migration toolkit:
record_other_dois.py(captures pre-existing DOIs) andmint_zenodo_dois.py(mints canonical Zenodo DOI and writes back tobibo:doi) - Removed the earlier three-tier DOI plan (existing then CrossRef then Zenodo); CrossRef enrichment is no longer in scope
- Added Zenodo fair-usage caveat: write to Zenodo for an upfront agreement before the first bulk run
- Editor backstop documented: if Zenodo declines records with existing publisher DOIs, those keep their existing DOI in
bibo:doi - Expanded migration ledger to include
kaanu_doiandother_doicolumns
2026-04-19
- Adopted permanent Kaanu bibliographic identifier format:
kb000001 - Adopted canonical item URL pattern:
https://kaanu.org/bib/{kaanu_id} - Adopted canonical PDF URL pattern:
https://kaanu.org/bib/{kaanu_id}/download - Expanded migration ledger template to include
kaanu_id, canonical URLs, rights fields, and redirect target - Agreed that title-slug URLs may exist as aliases later, but must not be canonical