feat: import DOCX/PDF/scanned templates via DeepSeek recognition
Backend pipeline:
- POST /api/templates/import (multipart, max 25 MB)
- extract.ts: DOCX→mammoth, PDF→pdf-parse, fallback to OCR via tesseract+poppler-utils
(pdftoppm renders pages to PNG, tesseract reads with rus+eng)
- deepseek.ts: chat completions client with strict JSON response_format
- recognize.ts: structured prompt that produces simplified DocBody (string text),
postprocessor wraps text in TipTap-compatible JSON, validates with zod schema
- prompt enforces placeholder substitution: {{customer.*}}, {{executor.*}},
{{contract.number}}, {{contract.date}}, {{today}}
- error codes: NO_OCR / NO_DEEPSEEK_KEY / UNSUPPORTED_MIME / INVALID_DOC_BODY
Dockerfile: apk add tesseract-ocr (+rus +eng data), poppler-utils, imagemagick
Frontend:
- Templates page: ⤴ Загрузить документ → file picker (.docx,.pdf,.png,.jpg)
- doc type selector (contract/invoice/act/upd)
- import-banner with spinner shows uploading→analyzing stages
- on success navigates to /templates/:id (TemplateEdit) for review
Reuses DEEPSEEK_API_KEY pattern from Hall-planer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -20,15 +20,20 @@
|
||||
"@fastify/cookie": "^9.4.0",
|
||||
"@fastify/cors": "^9.0.1",
|
||||
"@fastify/helmet": "^11.1.1",
|
||||
"@fastify/multipart": "^10.0.0",
|
||||
"@prisma/client": "^5.22.0",
|
||||
"fastify": "^4.28.1",
|
||||
"fastify-plugin": "^4.5.1",
|
||||
"jose": "^5.9.6",
|
||||
"mammoth": "^1.12.0",
|
||||
"node-tesseract-ocr": "^2.2.1",
|
||||
"pdf-parse": "^2.4.5",
|
||||
"puppeteer-core": "^24.42.0",
|
||||
"zod": "^3.23.8"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^20.16.0",
|
||||
"@types/pdf-parse": "^1.1.5",
|
||||
"embedded-postgres": "^18.3.0-beta.17",
|
||||
"pino-pretty": "^11.3.0",
|
||||
"prisma": "^5.22.0",
|
||||
|
||||
Reference in New Issue
Block a user