Paperless-ngx
Self-hosted document management with OCR, full-text search, and automatic tagging. Scan a document once and find it in seconds by content, date, correspondent, or tag. Active community fork of the original Paperless
Quick Start
curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docker/compose/docker-compose.mariadb-tika.yml -o docker-compose.yml && docker compose pull && docker compose run --rm webserver createsuperuser && docker compose up -d Overview
Paperless-ngx is a self-hosted document management system that replaces physical filing and cloud storage with a searchable, tagged archive on your own hardware. Every document you add — whether scanned from paper, uploaded as a PDF, or ingested from email — is run through Tesseract OCR, indexed for full-text search, and stored with metadata that makes it findable in seconds.
The search works the way you actually remember documents. Search for “electricity July” and it finds your July electricity bill. Search for a contract party’s name and it surfaces every document that mentions them. You are not searching filenames or folder structures — you are searching the actual content of the documents.
Auto-tagging is the feature that makes the system low-maintenance over time. Paperless-ngx uses a machine learning model trained on your own filing patterns to assign tags, correspondents, and document types automatically. After you have manually classified a few hundred documents, the model handles most incoming documents without any manual input.
Documents can arrive through a watched folder (connect your scanner’s output directory), email ingestion from a monitored mailbox, the web UI, the REST API, or a mobile upload from any scanner app that supports document sharing.
The deployment requires a few containers — a web server, Celery task workers, Redis for queuing, and a database. The official docker compose setup handles this, but it is not a single-container install.
Paperless-ngx: Pros & Cons
| Pros (The Wins) | Cons (The Friction) |
|---|---|
| Full-text OCR search: Find any document by its actual content, not filename. | Multi-container setup: Redis, Celery, and database needed alongside the web server. |
| Auto-tagging ML: Learns your filing patterns; classifies new docs automatically. | OCR scan quality: Poor-resolution scans produce poor search results. |
| Multiple ingestion paths: Email, watched folder, API, and mobile upload all work. | Training required: Auto-tagging needs a batch of manually tagged docs first. |
| 41.7k stars: Most active self-hosted document management system. | No scanner app: Relies on third-party mobile apps for phone scanning. |
Use Cases
Specific ways to use Paperless-ngx for your workflow.
Deployment Strategy
Recommended ways to host Paperless-ngx in your own environment.