FlagshipFull-StackIndustrial SaaSConfidential

Entrek Engineering
Operations Portal

Centralized operations platform for a Calgary oil and gas EPCM firm — project management, procurement, AI document extraction, and cloud infrastructure. Designed and built solo. In production on AWS.

$250K+

Agency replacement value

Domain modules

AI extraction pipelines

< 3 months

Time to production

Engineer (solo)

Confidential internal system. Architecture, technical decisions, and measurable outcomes are documented below. Screenshots and source code are not publicly available due to enterprise security requirements.

Project Overview

Entrek Engineering Ltd. is a Calgary-based oil and gas engineering and project management firm. Prior to this system, their operations ran across spreadsheets, email threads, and disconnected tools — managing projects, vendors, clients, procurement, field tickets, invoices, and bid evaluations without a unified platform.

I was engaged as the sole developer to build their entire technical infrastructure from zero — architecture, frontend, backend, database, AI integration, DevOps, and cloud deployment.

The result: a production-grade multi-module web platform deployed on AWS, with a React/TypeScript frontend, FastAPI/Python backend, PostgreSQL database, and six integrated AI extraction pipelines — delivered in under three months.

System Architecture

Frontend

React 19 · TypeScript · Vite · Tailwind CSS
Zustand · TanStack Query · Zod · React Hook Form
Recharts · Framer Motion · Feature-Sliced Design

Backend

Python 3.12 · FastAPI · SQLAlchemy 2.0 · Alembic · Pydantic
PostgreSQL (AWS RDS) · JWT authentication · RBAC
AWS S3 (file storage) · APScheduler (notifications)

AI & Document Processing

OpenAI GPT-4 Vision · AsyncOpenAI · asyncio.gather()
Tesseract OCR (orientation correction + preprocessing)
LibreOffice (document conversion) · ODA SDK (DXF/CAD)
pdfplumber · python-docx · pdf2image

Infrastructure

AWS EC2 · AWS RDS PostgreSQL · AWS S3 · Route 53 · VPC
Docker + Docker Compose · Nginx · Let's Encrypt TLS 1.3
Staging + Production environments
Feature flags via Vite build arguments

Domain Modules

Module	Core Functionality	AI Engine
Projects	Project lifecycle, status, deliverables	✓ PIF extraction
Procurement	Purchase orders, shipping, vendor quotes	✓ PO extraction
Bid Spread	Vendor proposals, comparison tables, leveling	✓ Proposal extraction + leveling
Invoices	Invoice management, approval workflows	✓ Invoice extraction
Field Tickets	Field operations documentation	✓ Field ticket extraction
Cost Codes	Project cost structure, client mappings	✓ AI categorization
Budget	Budget tracking, burn analysis, reporting	✓ Budget extraction
P&ID Copilot	Engineering drawing analysis, topology	✓ LLM agent
Clients	Client relationship management	—
Vendors	Vendor registry, document vault	—
Documents	Hierarchical folder system	—
Dashboard	KPIs, charts, activity feed	—
Meeting Minutes	Rich text editor, PDF/DOCX export	—
Users	Admin management, RBAC	—

AI Document Extraction Architecture

The most technically demanding aspect of the system. Six separate AI engines handle different document types — each with identical architectural patterns but tuned for different schemas and document characteristics.

# Concurrent batch processing — 70-page document in ~25s instead of ~75s
async def extract_document(file_path: str) -> dict:
    pages = await load_and_preprocess_pages(file_path)

    # Preprocessing pipeline per page:
    # 1. Orientation correction (pytesseract OSD)
    # 2. Contrast enhancement + sharpening + upscaling
    # 3. Send to GPT-4 Vision

    batches = [pages[i:i+20] for i in range(0, len(pages), 20)]

    # All batches processed concurrently
    results = await asyncio.gather(
        *[process_batch(batch) for batch in batches]
    )

    # Intelligent merge: first-non-null wins for core fields,
    # last-non-null for equipment attributes
    return merge_results(results)

Retry logic: 3-attempt exponential backoff on both JSON parse errors and OpenAI API errors. Force-JSON response format prevents free-text deviation. Intelligent result merging handles multi-page documents where fields span pages.

P&ID Copilot (Engineering Drawing Intelligence)

Most novel module. Parses DXF CAD files (engineering process-and-instrumentation diagrams) using ezdxf, extracts equipment topology, and runs an LLM agent over the extracted data for safety auditing, equipment inventory, and Q&A.

Pipeline: DXF → geometry extraction → equipment classification → topology graph → LangChain agent → structured output.

This is the kind of feature that would take a consultant 3–4 weeks to scope. I built it as one module of fourteen.

Multi-Environment Infrastructure

Production and staging environments with complete isolation — separate EC2 instances, separate RDS databases, separate S3 buckets. Feature flags injected at Docker build time mean the same codebase powers both environments with different feature sets. Nginx configured with TLS 1.2/1.3, gzip compression, 600s proxy timeouts for AI operations, and 200MB client body size for document uploads.

Architecture Decision Record

Why Feature-Sliced Design?

Each module needed to be independently deployable, independently testable, and independently understandable. When a bug appears in bid_spread, I need to know it cannot cascade into invoice behavior. FSD enforces this at the filesystem level.

Why FastAPI over Django/Flask?

AsyncIO is mandatory for concurrent AI extraction. Django's sync-first architecture would have required workarounds for asyncio.gather(). FastAPI with async endpoint handlers is the correct tool.

Why Alembic for migrations?

A system this complex requires migration tracking. Alembic's version-controlled schema changes are non-negotiable for a production system with real data. Dropping and recreating tables is not an option once Entrek has operational data.

Why not a task queue for AI extraction?

The documents are small enough that asyncio.gather() within a single request lifecycle provides sufficient concurrency. A task queue adds operational complexity — Redis, worker processes, monitoring — without meaningful benefit at current scale. Explicit decision to revisit as volume grows.

Quantitative Outcomes

Metric	Before	After
Procurement tracking	Spreadsheets + email	Unified PO system with AI extraction
Invoice processing	Manual data entry	AI extraction in < 30 seconds
Bid evaluation	Multiple spreadsheets	Unified comparison with leveling assist
Field documentation	Paper forms	Digital with AI extraction
Project visibility	Siloed per person	Dashboard with real-time KPIs
Document storage	Email attachments	Hierarchical vault with search
Engineering drawings	Static PDF/DXF	Queryable via LLM agent

Engineering Learnings

asyncio.gather() is the right tool for batch AI processing

Sequential page processing for a 70-page document takes ~75 seconds. Concurrent batch processing takes ~25 seconds. The math is simple; the implementation requires careful error handling per batch so one page failure doesn't fail the entire document.

Feature flags at build time are underused

Injecting feature flags as Docker build arguments means the same codebase powers staging and production with different feature sets. No runtime flag storage, no configuration service, no additional complexity.

AI engines should be completely isolated from business logic

A bug in proposal_engine.py cannot affect bid_spread_router.py because they're in separate packages with no shared state. This isolation took discipline to maintain — the temptation to share utility functions is constant. The discipline pays off when debugging at 11pm.

Defensive merging matters for multi-page documents

A 70-page proposal might have the vendor name on page 1 and the total price on page 47. The merge strategy was designed after observing how information distributes across real documents. Generic "take first batch" strategies fail in practice.

Solo full-stack ownership

Complete system designed, built, and deployed by one engineer — frontend, backend, AI pipelines, database, infrastructure, and DevOps.

Production infrastructure

AWS EC2, RDS, S3, Route 53 with separate staging and production environments, TLS 1.3, Docker, health checks, and multi-stage builds.

Industrial domain depth

Built for oil and gas EPCM operations — bid spreads, field tickets, procurement, P&ID drawings. The domain knowledge is inseparable from the technical execution.

Full Stack

React 19TypeScriptViteTailwind CSSZustandTanStack QueryPython 3.12FastAPISQLAlchemy 2.0AlembicPydanticOpenAI GPT-4 VisionLangChainTesseract OCRezdxfPostgreSQLAWS EC2AWS RDSAWS S3Route 53DockerNginxTLS 1.3

Entrek EngineeringOperations Portal