feat: import DOCX/PDF/scanned templates via DeepSeek recognition

Backend pipeline:
- POST /api/templates/import (multipart, max 25 MB)
- extract.ts: DOCX→mammoth, PDF→pdf-parse, fallback to OCR via tesseract+poppler-utils
  (pdftoppm renders pages to PNG, tesseract reads with rus+eng)
- deepseek.ts: chat completions client with strict JSON response_format
- recognize.ts: structured prompt that produces simplified DocBody (string text),
  postprocessor wraps text in TipTap-compatible JSON, validates with zod schema
- prompt enforces placeholder substitution: {{customer.*}}, {{executor.*}},
  {{contract.number}}, {{contract.date}}, {{today}}
- error codes: NO_OCR / NO_DEEPSEEK_KEY / UNSUPPORTED_MIME / INVALID_DOC_BODY

Dockerfile: apk add tesseract-ocr (+rus +eng data), poppler-utils, imagemagick

Frontend:
- Templates page: ⤴ Загрузить документ → file picker (.docx,.pdf,.png,.jpg)
- doc type selector (contract/invoice/act/upd)
- import-banner with spinner shows uploading→analyzing stages
- on success navigates to /templates/:id (TemplateEdit) for review

Reuses DEEPSEEK_API_KEY pattern from Hall-planer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
admin
2026-05-01 11:40:28 +03:00
parent 624d378bb5
commit e768d30fb6
13 changed files with 1114 additions and 7 deletions
+6 -1
View File
@@ -13,7 +13,12 @@ RUN apk add --no-cache \
ca-certificates \
ttf-dejavu \
ttf-liberation \
font-noto-cjk
font-noto-cjk \
tesseract-ocr \
tesseract-ocr-data-rus \
tesseract-ocr-data-eng \
poppler-utils \
imagemagick
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser
ENV PUPPETEER_SKIP_DOWNLOAD=true
+3
View File
@@ -42,6 +42,9 @@ services:
TOCHKA_JWT_KEY: ${TOCHKA_JWT_KEY:-}
TOCHKA_WEBHOOK_SECRET: ${TOCHKA_WEBHOOK_SECRET:-}
DADATA_API_KEY: ${DADATA_API_KEY:-}
DEEPSEEK_API_KEY: ${DEEPSEEK_API_KEY:-}
DEEPSEEK_BASE_URL: ${DEEPSEEK_BASE_URL:-https://api.deepseek.com}
DEEPSEEK_MODEL: ${DEEPSEEK_MODEL:-deepseek-chat}
DEFAULT_ORGANIZATION_ID: ${DEFAULT_ORGANIZATION_ID:-00000000-0000-0000-0000-000000000001}
DEV_BYPASS_AUTH: "0"
expose: