Solution
Duct turns PDFs, scans, and emails into searchable data. Condicio structures contract terms so they move between platforms without rework.
The problem
Most legal documents are trapped in formats built for print, not search. Contract terms sit in scanned PDFs. Obligation dates live in email attachments. Teams manually extract, reconcile, and re-enter the same data across CLM platforms, spreadsheets, and knowledge bases. The work is invisible, repetitive, and error-prone.
How it works
Multi-format ingestion pipeline: PDF, DOCX, scanned images, email archives, HTML.
Layout-aware chunking that respects document structure — section headers, clause boundaries, defined terms.
Hybrid search with BM25, vector embeddings, and cross-encoder reranking — no one retrieval method covers every legal document type.
Condicio schema for structured contract output: parties, dates, financial terms, obligations, risk flags. Portable across any CLM that adopts the standard.
Citation verification at ingestion time — LegalVerify checks every reference before it enters the knowledge base.