feat: import DOCX/PDF/scanned templates via DeepSeek recognition
Backend pipeline:
- POST /api/templates/import (multipart, max 25 MB)
- extract.ts: DOCX→mammoth, PDF→pdf-parse, fallback to OCR via tesseract+poppler-utils
(pdftoppm renders pages to PNG, tesseract reads with rus+eng)
- deepseek.ts: chat completions client with strict JSON response_format
- recognize.ts: structured prompt that produces simplified DocBody (string text),
postprocessor wraps text in TipTap-compatible JSON, validates with zod schema
- prompt enforces placeholder substitution: {{customer.*}}, {{executor.*}},
{{contract.number}}, {{contract.date}}, {{today}}
- error codes: NO_OCR / NO_DEEPSEEK_KEY / UNSUPPORTED_MIME / INVALID_DOC_BODY
Dockerfile: apk add tesseract-ocr (+rus +eng data), poppler-utils, imagemagick
Frontend:
- Templates page: ⤴ Загрузить документ → file picker (.docx,.pdf,.png,.jpg)
- doc type selector (contract/invoice/act/upd)
- import-banner with spinner shows uploading→analyzing stages
- on success navigates to /templates/:id (TemplateEdit) for review
Reuses DEEPSEEK_API_KEY pattern from Hall-planer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -35,6 +35,12 @@ DEFAULT_ORGANIZATION_ID=00000000-0000-0000-0000-000000000001
|
||||
# Бесплатно до 10000 запросов/сутки.
|
||||
DADATA_API_KEY=
|
||||
|
||||
# DeepSeek — разбор загруженных шаблонов (DOCX/PDF/скан) в DocBody-структуру.
|
||||
# Получить ключ: https://platform.deepseek.com/api_keys
|
||||
DEEPSEEK_API_KEY=
|
||||
# DEEPSEEK_BASE_URL=https://api.deepseek.com
|
||||
# DEEPSEEK_MODEL=deepseek-chat
|
||||
|
||||
# --- Dev-only ---
|
||||
# Если 1 — пропускает проверку JWT и подсовывает фейкового admin'а.
|
||||
# В production отказывается стартовать с этой переменной.
|
||||
|
||||
Reference in New Issue
Block a user