Paperless-ngx

productivityprivacy

Self-hosted document management with OCR, full-text search, and automatic tagging. Scan a document once and find it in seconds by content, date, correspondent, or tag. Active community fork of the original Paperless

#documents#ocr#scanning#filing#search#self-hosted#paperless

Quick Start

curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docker/compose/docker-compose.mariadb-tika.yml -o docker-compose.yml && docker compose pull && docker compose run --rm webserver createsuperuser && docker compose up -d

Overview

Paperless-ngx is a self-hosted document management system that replaces physical filing and cloud storage with a searchable, tagged archive on your own hardware. Every document you add — whether scanned from paper, uploaded as a PDF, or ingested from email — is run through Tesseract OCR, indexed for full-text search, and stored with metadata that makes it findable in seconds.

The search works the way you actually remember documents. Search for “electricity July” and it finds your July electricity bill. Search for a contract party’s name and it surfaces every document that mentions them. You are not searching filenames or folder structures — you are searching the actual content of the documents.

Auto-tagging is the feature that makes the system low-maintenance over time. Paperless-ngx uses a machine learning model trained on your own filing patterns to assign tags, correspondents, and document types automatically. After you have manually classified a few hundred documents, the model handles most incoming documents without any manual input.

Documents can arrive through a watched folder (connect your scanner’s output directory), email ingestion from a monitored mailbox, the web UI, the REST API, or a mobile upload from any scanner app that supports document sharing.

The deployment requires a few containers — a web server, Celery task workers, Redis for queuing, and a database. The official docker compose setup handles this, but it is not a single-container install.

Paperless-ngx: Pros & Cons

Pros (The Wins)Cons (The Friction)
Full-text OCR search:
Find any document by its
actual content, not filename.
Multi-container setup:
Redis, Celery, and database
needed alongside the web server.
Auto-tagging ML:
Learns your filing patterns;
classifies new docs automatically.
OCR scan quality:
Poor-resolution scans produce
poor search results.
Multiple ingestion paths:
Email, watched folder, API,
and mobile upload all work.
Training required:
Auto-tagging needs a batch
of manually tagged docs first.
41.7k stars:
Most active self-hosted
document management system.
No scanner app:
Relies on third-party mobile
apps for phone scanning.

Use Cases

Specific ways to use Paperless-ngx for your workflow.

01
Scan all paper mail and invoices and find any document instantly by searching its content
02
Build an automatic filing system where bank statements, contracts, and receipts are tagged on arrival
03
Create a searchable archive of scanned documents without uploading them to Google Drive or Dropbox
04
Ingest documents from email directly so anything sent to a monitored inbox gets filed automatically

Deployment Strategy

Recommended ways to host Paperless-ngx in your own environment.

docker
self-hosted