Kaanu archivist handbook

Rolling working document for bibliography migration, identifier assignment, URL permanence, and editorial operations during the current Omeka staging phase.

Last updated: 2026-04-25 Current platform: https://new.kaanu.org (to be moved to archive.kaanu.org) Current legacy source: Quartz/Obsidian vault plus WIKINDX where recoverable

Purpose

This handbook is the practical operating document for the Kaanu archivist/editor.

Use it to keep the following stable across migration waves:

what gets imported
how records are identified
what public URLs mean
how redirects are managed
when PDFs can be made public
what the current batch status is

Current operating decisions

Omeka S is the staging archive and current system of record.
Zotero is the migration buffer and editorial cleanup layer.
Quartz remains the legacy source until migration is complete.
PDFs should be batch-attached through Omeka workflows, not manually item by item.
Public item URLs must be permanent and independent of Omeka internal IDs.

Permanent identifier rule

Every bibliographic record gets one permanent Kaanu identifier at the moment it is accepted into the migration ledger.

Format

Prefix: kb
Numeric body: zero-padded 6 digits
Examples:
- kb000001
- kb000842

Rule

The identifier is assigned to the intellectual record, not to a particular file path.
Once assigned, a kaanu_id is never reused.
If a record is withdrawn, merged, or suppressed, its kaanu_id remains reserved.
If two records are merged, one kaanu_id becomes canonical and the other becomes a retired identifier that redirects to the canonical record.

Assignment practice

Assign identifiers sequentially from the migration ledger.
Do not wait for final publication to assign a kaanu_id.
The kaanu_id should be stored in:
- the migration ledger
- Omeka as a persistent identifier field
- any redirect map
- any future export or API feed

Permanent URL rule

Kaanu should use one identifier-based canonical public URL for each item.

Canonical item URL

Pattern:

https://kaanu.org/bib/{kaanu_id}

Examples:

https://kaanu.org/bib/kb000001
https://kaanu.org/bib/kb000842

Canonical PDF URL

Pattern:

https://kaanu.org/bib/{kaanu_id}/download

Example:

https://kaanu.org/bib/kb000001/download

Why this rule

The URL does not depend on title wording.
The URL does not depend on Quartz or Omeka path structure.
The URL does not expose implementation details like /item/847.
The bibliographic record remains stable even if the file becomes private or is replaced.

What must not be canonical

These may exist internally or as redirects, but should not be the long-term public identifier:

Omeka numeric item URLs such as /item/10
title-only slug URLs
paths with platform names like /omeka/, /quartz/, /new/
file extensions like .html or .php
workflow status markers such as draft, review, or latest

Redirect rule

Every known legacy bibliography URL should redirect to the canonical Kaanu item URL.

Redirect policy

Use 301 redirects when cutover is final.
Build redirects from the migration ledger, not ad hoc in server config.
Redirect from old Quartz title pages to https://kaanu.org/bib/{kaanu_id}.
Do not redirect directly to raw PDF URLs.

If an item is merged

old kaanu_id redirects to the surviving canonical kaanu_id
legacy Quartz URLs for both records redirect to the surviving canonical record

Canonical metadata rule

Each item should eventually carry these core fields:

kaanu_id
canonical item URL
canonical PDF URL if public
canonical DOI (Zenodo-minted), held in bibo:doi
pre-existing publisher DOI(s) for cross-reference, held in kaanu:otherDoi
legacy source path or URL
Zotero bridge identifier
Omeka internal item ID
PDF rights status

Required ledger columns

The migration ledger should now track the following minimum fields:

kaanu_id
canonical_item_url
canonical_pdf_url
kaanu_doi (the Zenodo-minted DOI, populated after the Phase 2D-4 mint pass)
other_doi (any pre-existing publisher DOI, semicolon-separated if more than one)
omeka_item_id
zotero_bridge_identifier
legacy_path_or_url
legacy_rel_path
legacy_title
year_guess
source_type
pdf_present
pdf_url
pdf_rights_status
public_file_allowed
original_source_url
candidate_zotero_item
import_status
redirect_needed
redirect_target_url
duplicate_group
notes

The template at migration/templates/migration_ledger_template.csv has been expanded to reflect this.

DOI policy

Every Kaanu record receives a Zenodo-minted DOI as its canonical DOI. Pre-existing publisher DOIs (CrossRef, DataCite-elsewhere) are not used as the canonical Kaanu DOI; they are captured separately for cross-reference.

Two fields, two roles

bibo:doi is the canonical Kaanu DOI, minted by Zenodo on deposit. Single value per record. Populated by the Phase 2D-4 mint pass and on every new accession thereafter.
kaanu:otherDoi is multivalued. Holds any pre-existing publisher DOI, in DOI-name form (10.NNNN/...). Empty for records without a prior DOI.

Order of operations

record_other_dois.py runs first on any record set that has been imported from Zotero or other sources. It pattern-matches DOI strings in dcterms:source, dcterms:identifier, dcterms:bibliographicCitation, and any imported Zotero DOI field, normalises them, and writes them to kaanu:otherDoi.
mint_zenodo_dois.py then deposits each record into the Kaanu community on Zenodo and writes the minted DataCite DOI back to bibo:doi.

Zenodo fair-usage caveat

Zenodo has a fair-usage policy for bulk deposit. The relevant red flags for a project of Kaanu’s shape are: dividing a single dataset into many records to circumvent upload limits, uploading very large numbers of records independent of data volume, and uploading content where the main purpose is indexing or archiving or promotion. Kaanu is a curated bibliographic archive at the scale of around 2,000 records. Before the first bulk run, write to Zenodo for an upfront agreement on the use case.

Editor backstop

If Zenodo declines records that already have a publisher DOI, those records keep their existing DOI in bibo:doi and kaanu:otherDoi is left empty for that record. The canonical-DOI rule still holds: every Kaanu record has exactly one bibo:doi.

Display rule

The item page shows the canonical DOI (bibo:doi) on the headline citation line. If kaanu:otherDoi is populated, a “Publisher DOI” line appears below it. The user-visible distinction is “this is Kaanu’s DOI” versus “this DOI was assigned by the original publisher”.

Editorial workflow

For each incoming or migrated record:

Create or confirm the ledger row.
Assign a kaanu_id.
Set the canonical item URL from the kaanu_id.
Decide whether the file is:
- metadata only
- public PDF allowed
- private/off-platform
Clean the metadata in Zotero.
Import into Omeka.
Confirm the Omeka item is mapped back to the ledger row.
Run record_other_dois.py to capture any pre-existing publisher DOI into kaanu:otherDoi.
Run mint_zenodo_dois.py to mint and store the canonical DOI in bibo:doi.
Add redirect coverage for any known legacy URL.

PDF policy

The item page is the permanent object of record.

The PDF is only one representation of that object.

This means:

if a PDF is public, expose it at .../download
if a PDF later needs to be withdrawn, keep the item URL alive
if a better file replaces the original, keep the item URL unchanged

Batch import policy

Do not run the migration as:

metadata first by one process
PDFs later by manual cleanup

Instead, aim for one repeatable batch pipeline:

cleaned metadata CSV
encoded PDF URL where allowed
Omeka import in controlled waves

Recommended wave size after pilot validation:

50 to 150 records per run

Search and indexing policy

To avoid indexing problems:

expose only one canonical public item URL per record
add canonical tags to public item pages at final domain cutover
keep staging domains out of indexing where appropriate
generate sitemap entries from canonical item URLs only
keep redirects stable and permanent

Rolling notes

2026-04-25

Adopted Zenodo-for-all DOI strategy: every Kaanu record receives a Zenodo-minted DOI in bibo:doi
Added kaanu:otherDoi to the Kaanu vocabulary (literal, multivalued) for any pre-existing publisher DOI
Added two scripts to the migration toolkit: record_other_dois.py (captures pre-existing DOIs) and mint_zenodo_dois.py (mints canonical Zenodo DOI and writes back to bibo:doi)
Removed the earlier three-tier DOI plan (existing then CrossRef then Zenodo); CrossRef enrichment is no longer in scope
Added Zenodo fair-usage caveat: write to Zenodo for an upfront agreement before the first bulk run
Editor backstop documented: if Zenodo declines records with existing publisher DOIs, those keep their existing DOI in bibo:doi
Expanded migration ledger to include kaanu_doi and other_doi columns

2026-04-19

Adopted permanent Kaanu bibliographic identifier format: kb000001
Adopted canonical item URL pattern: https://kaanu.org/bib/{kaanu_id}
Adopted canonical PDF URL pattern: https://kaanu.org/bib/{kaanu_id}/download
Expanded migration ledger template to include kaanu_id, canonical URLs, rights fields, and redirect target
Agreed that title-slug URLs may exist as aliases later, but must not be canonical

Kaanu ಕಾನು

Explorer