Uploading Books

How to upload ebooks to your Colibri library

11 min read

Colibri supports uploading EPUB, MOBI, and PDF ebooks with automatic metadata extraction and enrichment from external sources.

Supported Formats

Colibri can read and store the following ebook formats:

Format	Extension	Metadata Support	Cover Extraction	Features
EPUB	`.epub`	Full support	Yes	Best format, native support for all metadata fields including series, authors, publisher, description, and subjects
MOBI	`.mobi`, `.azw`, `.azw3`	Full support	Yes	Full metadata including Amazon-specific fields, creator information, and cover images
PDF	`.pdf`	Limited	Sometimes	Basic metadata (title, author, subject), cover extraction from first page

Format Recommendations

EPUB: Recommended for fiction and non-fiction. Best metadata support and smallest file sizes.
MOBI/AZW: Good for Kindle books. Full metadata support but larger files.
PDF: Suitable for technical books and documents, but limited metadata extraction.

Upload Methods

Single Upload

Upload one book at a time with immediate metadata review:

Click the Upload button in the sidebar (or press Cmd+U / Ctrl+U)
Drag and drop a file onto the upload area, or click Browse to select a file
Wait for the file to upload and metadata to be extracted (usually 2-5 seconds)
Review the extracted metadata:
- Title
- Authors/Contributors
- Cover image
- Publisher and publication date
- Description/Synopsis
- ISBN and other identifiers
- Language
- Series information (if available)
Edit any fields as needed
Click Enrich Metadata to fetch additional data from external sources (optional)
Click Save to add the book to your library

When to use single upload:

You want to carefully review each book’s metadata
You’re uploading rare or unusual books that may need manual metadata corrections
You want to immediately add the book to specific collections

Bulk Upload

Upload multiple books at once for faster library building:

Click the Upload button in the sidebar
Select multiple files at once:
- Drag and drop: Select multiple files in your file explorer and drag them to the upload area
- File browser: Hold Cmd (Mac) or Ctrl (Windows/Linux) while selecting multiple files
Files are added to the upload queue and processed automatically
Monitor progress in the Queue Status Widget (bottom-right corner):
- Shows total files in queue
- Current file being processed
- Processing status (extracting, enriching, saving)
- Any errors encountered
Each book is processed sequentially to avoid overwhelming the database
Review and edit metadata for each book as it completes processing
Click Save on each book to add it to your library

Queue Processing:

Files are processed one at a time to ensure data integrity
Average processing time: 5-10 seconds per book (depending on enrichment)
Queue persists across page refreshes
You can continue browsing while uploads process in the background

When to use bulk upload:

You’re importing an existing library
You have many books with good embedded metadata
You trust the automatic enrichment process

Upload from CLI

Advanced users can upload books using the Colibri CLI:

bash

# Upload a single book
colibri works add ./path/to/book.epub

# Upload all books in a directory
colibri works import ./path/to/books/

See the CLI Documentation for more details.

Automatic Metadata Extraction

When you upload a book, Colibri automatically extracts embedded metadata from the ebook file:

Core Metadata

Title: Main title and subtitle (if available)
Sort Title: Automatically generated for proper alphabetical sorting (e.g., “Great Gatsby, The” for “The Great Gatsby”)
Authors: Creator names with roles (author, editor, translator, illustrator, etc.)
Publisher: Publishing company name
Publication Date: Original publication date or edition date
Language: ISO 639-1 language code (e.g., “en” for English, “de” for German)

Identifiers

ISBN: ISBN-10 and/or ISBN-13 (automatically validated and normalized)
ASIN: Amazon Standard Identification Number (for Kindle books)
DOI: Digital Object Identifier (for academic publications)
LCCN: Library of Congress Control Number

Content Metadata

Description: Book synopsis or summary
Subjects/Tags: Categories, genres, and topics (automatically parsed from BISAC codes if present)
Series: Series name and position/volume number
Page Count: Number of pages (for EPUB and MOBI)

Visual Elements

Cover Image: Extracted from the ebook file and automatically generated thumbnails
Blurhash: Low-resolution preview for fast loading

Format-Specific Extraction

EPUB:

Reads metadata from content.opf file
Extracts Dublin Core metadata
Parses Calibre metadata extensions
Supports series information from meta tags

MOBI:

Reads metadata from EXTH records
Extracts Amazon-specific fields
Supports creator information and contributor roles

PDF:

Reads metadata from PDF Info dictionary
Limited to basic fields (title, author, subject, keywords)
Attempts to extract cover from first page

Metadata Enrichment

After extracting embedded metadata, Colibri can automatically enrich it with data from external sources.

What Gets Enriched?

Enrichment fills in missing fields and improves existing metadata:

Missing authors: Find creator names from ISBN lookup
Missing publisher: Discover publisher information
Missing description: Add synopsis from book databases
Missing subjects: Add genre tags and categories
Missing series: Detect series relationships
Cover image: Find higher-quality covers
Publication details: Add precise publication dates
Identifiers: Discover additional ISBNs and identifiers

Enrichment Process

Trigger enrichment:
- Automatically during upload (if enabled in settings)
- Manually by clicking Enrich Metadata on the upload form
- Later from the book detail page by clicking Fetch Metadata
Provider selection:
- Colibri queries multiple metadata providers in parallel
- Default providers: Open Library, WikiData, Library of Congress
- Additional providers (requires API keys): Google Books, ISBNdb, Springer
Data aggregation:
- Results from all providers are collected
- Confidence scores are calculated for each field
- Conflicts are detected and resolved automatically
Preview and apply:
- Review suggested changes before applying
- Each field shows the source and confidence score
- Accept all changes, or selectively apply specific fields
- Manually edit any values before saving

See Metadata Enrichment for detailed information about the enrichment system.

Enrichment Strategies

Conservative (Default):

Only fills in missing fields
Does not overwrite existing metadata
Best for books with good embedded metadata

Aggressive:

Overwrites existing metadata with higher-confidence data
Replaces all fields if external sources have better data
Best for books with poor or missing metadata

Merge:

Combines embedded and external metadata
Keeps all unique subjects/tags
Adds additional authors and identifiers
Best for comprehensive metadata coverage

Configure enrichment strategy in Settings > Instance > Metadata.

Duplicate Detection

Colibri automatically detects potential duplicates to prevent importing the same book multiple times.

Detection Methods

1. Exact File Match

Compares SHA-256 hash of the file content
Confidence: 100% (identical file)
Action: Skip upload by default, or prompt to replace

2. ISBN Match

Compares ISBN-10 or ISBN-13 identifiers
Confidence: 95%+ (same edition)
Action: Add as new edition or prompt
Note: Different formats (EPUB vs MOBI) of the same book will match

3. Fuzzy Title and Author Match

Compares normalized titles and author names using Levenshtein distance
Ignores case, punctuation, and common words (a, an, the)
Confidence: 70-90% (depends on similarity)
Action: Always prompts for review

Duplicate Handling Options

When a duplicate is detected, you can:

Skip Upload
- Discard the new file
- Keep the existing book
- No changes to your library
Replace Existing
- Delete the old file
- Upload the new file
- Keep existing metadata (unless you choose to re-enrich)
- Use when you have a better quality file
Add as New Edition
- Keep both files
- Link them as different editions of the same work
- Useful for different formats (EPUB and PDF) or different editions (original and annotated)
- Share metadata between editions
Import Anyway
- Treat as a completely separate work
- Useful for false positive matches
- Creates a new work entry

Duplicate Detection Settings

Configure detection sensitivity in Settings > Instance > Library:

Strict: Only exact ISBN or file hash matches
Normal (Default): ISBN matches and high-confidence title/author matches (85%+)
Relaxed: Include lower-confidence fuzzy matches (70%+)
Off: Disable duplicate detection entirely

Queue Status Monitoring

The Queue Status Widget in the bottom-right corner shows real-time upload progress.

Idle

No uploads in progress
Widget is minimized or hidden

Processing

Shows current file name and progress
Displays processing stage:
- Uploading: Transferring file to server
- Extracting: Reading embedded metadata
- Enriching: Querying external sources
- Saving: Writing to database
Progress bar shows completion percentage

Error

Red indicator with error count
Click to expand and see error details
Options to retry failed uploads or skip them

Queue Actions

Expand/Collapse: Click widget to show/hide details
Pause: Temporarily stop processing
Resume: Continue processing after pause
Clear Queue: Remove all pending uploads
Retry Failed: Attempt to re-process failed uploads

Queue Persistence

Queue state is saved to browser storage
Survives page refreshes and browser restarts
Automatically resumes when you return to Colibri
Queue is per-user (not shared across devices)

Error Handling and Troubleshooting

Common Upload Errors

File Too Large

Error: “File exceeds maximum size limit”
Cause: File is larger than 100 MB
Solution: Use a file compression tool or split into smaller files
Note: Administrators can adjust the limit in instance settings

Unsupported Format

Error: “File type not supported”
Cause: File is not a valid EPUB, MOBI, or PDF
Solution: Convert the file to a supported format using Calibre or similar tools

Corrupt File

Error: “Unable to read file” or “Invalid ebook format”
Cause: File is damaged or incomplete
Solution: Re-download the file or obtain a new copy

Metadata Extraction Failed

Error: “Could not extract metadata”
Cause: File has malformed metadata or uses an unusual format
Solution: Upload will continue, but you’ll need to enter metadata manually

Enrichment Timeout

Error: “Metadata enrichment timed out”
Cause: External providers are slow or unavailable
Solution: Skip enrichment and add metadata manually, or retry later

Storage Error

Error: “Failed to save file to storage”
Cause: S3 storage is misconfigured or unreachable
Solution: Contact your administrator to check storage settings

Retry Logic

Colibri automatically retries transient errors:

Network errors: 3 retries with exponential backoff
Server errors (5xx): 2 retries
Client errors (4xx): No retries (requires manual intervention)

Getting Help

If you encounter persistent upload errors:

Check the browser console for detailed error messages (F12 → Console tab)
Try uploading a different file to isolate the issue
Verify your storage configuration in Settings
Contact your instance administrator
Report bugs at https://github.com/colibri-hq/colibri/issues

Format Conversion Tips

Converting to EPUB

If you have books in unsupported formats, convert them to EPUB:

Using Calibre (Recommended):

bash

# Install Calibre
# Visit https://calibre-ebook.com/

# Convert single file
ebook-convert input.azw output.epub

# Convert with metadata
ebook-convert input.pdf output.epub --authors "Author Name" --title "Book Title"

Using Pandoc (for text-based formats):

bash

# Install Pandoc
# Visit https://pandoc.org/

# Convert Markdown to EPUB
pandoc input.md -o output.epub --metadata title="Book Title"

Preserving Metadata During Conversion

Use Calibre’s metadata editor before converting
Export metadata from original file format
Re-embed metadata after conversion
Manually verify metadata after upload

Best Practices

Before Uploading

Organize files: Use consistent naming (e.g., “Author - Title.epub”)
Pre-clean metadata: Use Calibre to fix obvious metadata errors
Check for duplicates: Search your library before uploading
Verify file integrity: Ensure files are not corrupt

During Upload

Use enrichment: Let Colibri fill in missing metadata automatically
Review carefully: Check that automatic metadata is accurate
Add to collections: Organize books immediately during upload
Tag appropriately: Add custom tags for better discoverability

After Upload

Verify cover images: Ensure covers loaded correctly
Check series relationships: Verify series order if applicable
Add reviews: Rate and review books you’ve read
Share collections: Make your curated lists public (if desired)

Keyboard Shortcuts

Speed up your upload workflow with keyboard shortcuts:

Cmd/Ctrl + U: Open upload modal
Cmd/Ctrl + V: Paste files from clipboard (if supported by browser)
Cmd/Ctrl + Enter: Save and add book to library
Escape: Cancel upload and close modal

Next Steps

After uploading your books:

Organize with Collections: Create custom reading lists
Enrich Metadata: Improve book information
Rate and Review: Share your opinions
Search Your Library: Find books in your library

Supported Formats ​

Format Recommendations ​

Upload Methods ​

Single Upload ​

Bulk Upload ​

Upload from CLI ​

Automatic Metadata Extraction ​

Core Metadata ​

Identifiers ​

Content Metadata ​

Visual Elements ​

Format-Specific Extraction ​

Metadata Enrichment ​

What Gets Enriched? ​

Enrichment Process ​

Enrichment Strategies ​

Duplicate Detection ​

Detection Methods ​

Duplicate Handling Options ​

Duplicate Detection Settings ​

Queue Status Monitoring ​

Widget States ​

Queue Actions ​

Queue Persistence ​

Error Handling and Troubleshooting ​

Common Upload Errors ​

Retry Logic ​

Getting Help ​

Format Conversion Tips ​

Converting to EPUB ​

Preserving Metadata During Conversion ​

Best Practices ​

Before Uploading ​

During Upload ​

After Upload ​

Keyboard Shortcuts ​

Next Steps ​

Related Documentation ​

Supported Formats

Format Recommendations

Upload Methods

Single Upload

Bulk Upload

Upload from CLI

Automatic Metadata Extraction

Core Metadata

Identifiers

Content Metadata

Visual Elements

Format-Specific Extraction

Metadata Enrichment

What Gets Enriched?

Enrichment Process

Enrichment Strategies

Duplicate Detection

Detection Methods

Duplicate Handling Options

Duplicate Detection Settings

Queue Status Monitoring

Widget States

Queue Actions

Queue Persistence

Error Handling and Troubleshooting

Common Upload Errors

Retry Logic

Getting Help

Format Conversion Tips

Converting to EPUB

Preserving Metadata During Conversion

Best Practices

Before Uploading

During Upload

After Upload

Keyboard Shortcuts

Next Steps

Related Documentation