Entrek Engineering
Operations Portal
Centralized operations platform for a Calgary oil and gas EPCM firm — project management, procurement, AI document extraction, and cloud infrastructure. Designed and built solo. In production on AWS.
Confidential internal system. Architecture, technical decisions, and measurable outcomes are documented below. Screenshots and source code are not publicly available due to enterprise security requirements.
Project Overview
Entrek Engineering Ltd. is a Calgary-based oil and gas engineering and project management firm. Prior to this system, their operations ran across spreadsheets, email threads, and disconnected tools — managing projects, vendors, clients, procurement, field tickets, invoices, and bid evaluations without a unified platform.
I was engaged as the sole developer to build their entire technical infrastructure from zero — architecture, frontend, backend, database, AI integration, DevOps, and cloud deployment.
The result: a production-grade multi-module web platform deployed on AWS, with a React/TypeScript frontend, FastAPI/Python backend, PostgreSQL database, and six integrated AI extraction pipelines — delivered in under three months.
System Architecture
Frontend
React 19 · TypeScript · Vite · Tailwind CSS Zustand · TanStack Query · Zod · React Hook Form Recharts · Framer Motion · Feature-Sliced Design
Backend
Python 3.12 · FastAPI · SQLAlchemy 2.0 · Alembic · Pydantic PostgreSQL (AWS RDS) · JWT authentication · RBAC AWS S3 (file storage) · APScheduler (notifications)
AI & Document Processing
OpenAI GPT-4 Vision · AsyncOpenAI · asyncio.gather() Tesseract OCR (orientation correction + preprocessing) LibreOffice (document conversion) · ODA SDK (DXF/CAD) pdfplumber · python-docx · pdf2image
Infrastructure
AWS EC2 · AWS RDS PostgreSQL · AWS S3 · Route 53 · VPC Docker + Docker Compose · Nginx · Let's Encrypt TLS 1.3 Staging + Production environments Feature flags via Vite build arguments
Domain Modules
| Module | Core Functionality | AI Engine |
|---|---|---|
| Projects | Project lifecycle, status, deliverables | ✓ PIF extraction |
| Procurement | Purchase orders, shipping, vendor quotes | ✓ PO extraction |
| Bid Spread | Vendor proposals, comparison tables, leveling | ✓ Proposal extraction + leveling |
| Invoices | Invoice management, approval workflows | ✓ Invoice extraction |
| Field Tickets | Field operations documentation | ✓ Field ticket extraction |
| Cost Codes | Project cost structure, client mappings | ✓ AI categorization |
| Budget | Budget tracking, burn analysis, reporting | ✓ Budget extraction |
| P&ID Copilot | Engineering drawing analysis, topology | ✓ LLM agent |
| Clients | Client relationship management | — |
| Vendors | Vendor registry, document vault | — |
| Documents | Hierarchical folder system | — |
| Dashboard | KPIs, charts, activity feed | — |
| Meeting Minutes | Rich text editor, PDF/DOCX export | — |
| Users | Admin management, RBAC | — |
AI Document Extraction Architecture
The most technically demanding aspect of the system. Six separate AI engines handle different document types — each with identical architectural patterns but tuned for different schemas and document characteristics.
# Concurrent batch processing — 70-page document in ~25s instead of ~75s
async def extract_document(file_path: str) -> dict:
pages = await load_and_preprocess_pages(file_path)
# Preprocessing pipeline per page:
# 1. Orientation correction (pytesseract OSD)
# 2. Contrast enhancement + sharpening + upscaling
# 3. Send to GPT-4 Vision
batches = [pages[i:i+20] for i in range(0, len(pages), 20)]
# All batches processed concurrently
results = await asyncio.gather(
*[process_batch(batch) for batch in batches]
)
# Intelligent merge: first-non-null wins for core fields,
# last-non-null for equipment attributes
return merge_results(results)Retry logic: 3-attempt exponential backoff on both JSON parse errors and OpenAI API errors. Force-JSON response format prevents free-text deviation. Intelligent result merging handles multi-page documents where fields span pages.
P&ID Copilot (Engineering Drawing Intelligence)
Most novel module. Parses DXF CAD files (engineering process-and-instrumentation diagrams) using ezdxf, extracts equipment topology, and runs an LLM agent over the extracted data for safety auditing, equipment inventory, and Q&A.
Pipeline: DXF → geometry extraction → equipment classification → topology graph → LangChain agent → structured output.
This is the kind of feature that would take a consultant 3–4 weeks to scope. I built it as one module of fourteen.
Multi-Environment Infrastructure
Production and staging environments with complete isolation — separate EC2 instances, separate RDS databases, separate S3 buckets. Feature flags injected at Docker build time mean the same codebase powers both environments with different feature sets. Nginx configured with TLS 1.2/1.3, gzip compression, 600s proxy timeouts for AI operations, and 200MB client body size for document uploads.
Architecture Decision Record
Why Feature-Sliced Design?
Each module needed to be independently deployable, independently testable, and independently understandable. When a bug appears in bid_spread, I need to know it cannot cascade into invoice behavior. FSD enforces this at the filesystem level.
Why FastAPI over Django/Flask?
AsyncIO is mandatory for concurrent AI extraction. Django's sync-first architecture would have required workarounds for asyncio.gather(). FastAPI with async endpoint handlers is the correct tool.
Why Alembic for migrations?
A system this complex requires migration tracking. Alembic's version-controlled schema changes are non-negotiable for a production system with real data. Dropping and recreating tables is not an option once Entrek has operational data.
Why not a task queue for AI extraction?
The documents are small enough that asyncio.gather() within a single request lifecycle provides sufficient concurrency. A task queue adds operational complexity — Redis, worker processes, monitoring — without meaningful benefit at current scale. Explicit decision to revisit as volume grows.
Quantitative Outcomes
| Metric | Before | After |
|---|---|---|
| Procurement tracking | Spreadsheets + email | Unified PO system with AI extraction |
| Invoice processing | Manual data entry | AI extraction in < 30 seconds |
| Bid evaluation | Multiple spreadsheets | Unified comparison with leveling assist |
| Field documentation | Paper forms | Digital with AI extraction |
| Project visibility | Siloed per person | Dashboard with real-time KPIs |
| Document storage | Email attachments | Hierarchical vault with search |
| Engineering drawings | Static PDF/DXF | Queryable via LLM agent |
Engineering Learnings
asyncio.gather() is the right tool for batch AI processing
Sequential page processing for a 70-page document takes ~75 seconds. Concurrent batch processing takes ~25 seconds. The math is simple; the implementation requires careful error handling per batch so one page failure doesn't fail the entire document.
Feature flags at build time are underused
Injecting feature flags as Docker build arguments means the same codebase powers staging and production with different feature sets. No runtime flag storage, no configuration service, no additional complexity.
AI engines should be completely isolated from business logic
A bug in proposal_engine.py cannot affect bid_spread_router.py because they're in separate packages with no shared state. This isolation took discipline to maintain — the temptation to share utility functions is constant. The discipline pays off when debugging at 11pm.
Defensive merging matters for multi-page documents
A 70-page proposal might have the vendor name on page 1 and the total price on page 47. The merge strategy was designed after observing how information distributes across real documents. Generic "take first batch" strategies fail in practice.
Solo full-stack ownership
Complete system designed, built, and deployed by one engineer — frontend, backend, AI pipelines, database, infrastructure, and DevOps.
Production infrastructure
AWS EC2, RDS, S3, Route 53 with separate staging and production environments, TLS 1.3, Docker, health checks, and multi-stage builds.
Industrial domain depth
Built for oil and gas EPCM operations — bid spreads, field tickets, procurement, P&ID drawings. The domain knowledge is inseparable from the technical execution.
Full Stack