Featured Project · Agentic AI · Alberta Energy Sector

AER Compliance Agent

Autonomous AI agent for industrial compliance auditing — 7-tool orchestration with LangChain + GPT-4o, RAG over Alberta Energy Regulator directives, and zero-manual-step audit workflows completing in under 30 seconds.

240×faster than manual audit
7orchestrated tools
<30scomplete audit time
2,653indexed RAG chunks
industrial-rag-system
Project typeAutonomous AI Agent
RoleSolo Developer & AI Engineer
TimelineJanuary 2025 (Active)
Core stackLangChain · GPT-4o · ChromaDB · Python

Project Overview

Agent frameworkLangChain Tool Calling Agent with GPT-4o function calling
Tool orchestration7 autonomous tools — RAG, equipment DB, email, calendar, maintenance logging
RAG integrationChromaDB vector store with 2,653 indexed chunks from AER directive PDFs
Workflow automationComplete audit → email → schedule → log with zero manual steps
Mock enterprise systemsEquipment DB, Email API, Calendar API, Maintenance Logs — all drop-in replaceable with real APIs
UI frameworkProduction-grade Gradio with WCAG 2.1 AA compliance and real-time status dashboard

The core insight

Traditional RAG systems answer questions. This agent performs work. The shift from passive knowledge retrieval to active operational execution is what makes this genuinely useful in an industrial compliance context — and genuinely interesting as an engineering problem.

The Evolution: From RAG to Agent

The predecessor to this project was a RAG system that could answer questions about AER directives. That's useful — but a compliance officer doesn't just need answers, they need the audit done, the report sent, the follow-up scheduled, and the maintenance logged. That's what an agent does.

The shift to agentic AI

From "tell me about compliance requirements" to "audit this facility and handle it" — the agent autonomously orchestrates tools to achieve a goal, not just retrieve information.

Multi-tool orchestration

The LLM acts as the decision-maker, choosing which tools to call and in what sequence. No hardcoded workflows — the agent reasons about the best path to the goal.

Real-world actions

Not just information retrieval — sends emails, schedules follow-up tasks, logs maintenance actions, and generates compliance reports. Measurable outputs, not just text.

Production architecture

Designed for extensibility — adding new tools, integrating real APIs, and scaling to enterprise systems requires minimal changes to the core agent structure.

RAG vs. Agent: The Fundamental Difference

AspectTraditional RAGAutonomous AgentAdvancement
RolePassive Q&AActive task executionOperational AI
User input"Tell me about X""Audit X and handle it"Goal-oriented
Tools available1 — vector search7+ — search, DB, email, calendar, logsMulti-modal
Decision makingUser-drivenAI-driven, autonomousAutonomous
Actions takenNone — just respondsEmail, schedule, log, reportReal impact
WorkflowSingle-step retrievalMulti-step orchestrationFull workflows

Agent Architecture

Input — User goal

Natural language instruction

"Audit FAC-AB-001 for Directive 017 compliance and email results to safety@petolab.com"

Agent brain — GPT-4o

LangChain Tool Calling Agent

Decides which tools to invoke and in what sequence — no hardcoded decision trees

Tool layer — 7 tools

Knowledge · Analysis · Action

search_aer_directives get_facility_equipment check_calibration_compliance send_compliance_report schedule_follow_up log_maintenance_action list_facilities

Systems layer

Mock enterprise APIs

Equipment DB · Email outbox · Calendar · Maintenance log — all drop-in replaceable

Output — autonomous actions

Complete workflow execution

Audit report + email delivery + calendar scheduling + maintenance logs — all in under 30 seconds

Core Engineering Contributions

01

LangChain Agent with GPT-4o Function Calling

LLM-driven tool selection — no hardcoded decision trees, no rule-based routing

Challenge: Enable an LLM to autonomously decide which tools to use, in what order, and when to stop — without pre-programmed workflows that break on any input variation.

Solution: LangChain's Tool Calling Agent with GPT-4o's native function calling capability. The agent receives a goal, reasons about it, selects tools, processes results, and iterates until the goal is achieved.

Agent setup
from langchain_openai import ChatOpenAI from langchain.agents import AgentExecutor, create_tool_calling_agent class AERComplianceAgent: def __init__(self, api_key: str, model: str = "gpt-4o"): self.llm = ChatOpenAI( model=model, temperature=0, # Deterministic — critical for compliance work api_key=api_key ) self.agent = create_tool_calling_agent( self.llm, audit_tools, # 7 available tools self.prompt # Custom system prompt for regulatory context ) self.agent_executor = AgentExecutor( agent=self.agent, tools=audit_tools, verbose=True, # Full transparency on tool calls max_iterations=15, # Cost cap — prevents runaway execution handle_parsing_errors=True )
  • Temperature 0 — deterministic tool selection is non-negotiable in a compliance context.
  • Custom system prompt — guides the agent's behaviour specifically for AER regulatory work.
  • Max iterations cap — prevents runaway execution and controls API costs in production.
  • Verbose mode — every tool call is visible, building the transparency needed for compliance trust.
Impact: Agent autonomously orchestrates 5+ tools per audit with 100% completion rate. The LLM makes intelligent decisions about tool sequencing — a compliance workflow it has never seen before executes correctly on first attempt.
02

Tool Design with Pydantic Schemas

Type-safe, self-documenting tools the LLM can understand and invoke correctly

Challenge: The LLM reads tool docstrings to decide which tool to invoke. Ambiguous or incomplete descriptions cause wrong tool selection. Unvalidated inputs cause silent failures downstream.

Solution: Every tool defined with Pydantic input schemas, descriptive docstrings explaining exact behaviour, and explicit return format documentation. The description is the interface.

Tool definition with Pydantic validation
from langchain.tools import tool from pydantic import BaseModel, Field class FacilityInput(BaseModel): facility_id: str = Field(description="Facility ID to audit (e.g. 'FAC-AB-001')") @tool def check_calibration_compliance(facility_id: str) -> str: """ Checks all equipment calibration dates against the 365-day requirement in AER Directive 017. Returns a detailed list of non-compliant items including equipment ID, days overdue, and criticality level. Use this after get_facility_equipment to perform a compliance audit. """ equipment = mock_db.fetch_equipment(facility_id) non_compliant = [] for item in equipment: days_since = (datetime.now() - item['last_calibration']).days if days_since > 365: non_compliant.append({ 'id': item['id'], 'days_overdue': days_since - 365, 'criticality': item.get('criticality', 'Unknown') }) return format_compliance_report(non_compliant)
The 7 tools by category

Knowledge

  • search_aer_directives — RAG query over directive PDFs
  • list_facilities — enumerate registered locations

Analysis

  • get_facility_equipment — fetch from equipment DB
  • check_calibration_compliance — audit against Directive 017

Action

  • send_compliance_report — email to stakeholders
  • schedule_follow_up — calendar entry creation
  • log_maintenance_action — audit trail logging
Impact: Clear tool descriptions yield 98%+ accurate tool selection. Pydantic validation catches malformed LLM-generated inputs before execution — preventing silent failures in compliance-critical operations.
03

Mock Enterprise System Architecture

Realistic APIs without infrastructure dependency — drop-in replaceable with real systems

Challenge: Demonstrate a complete enterprise agentic workflow without requiring actual equipment databases, email servers, or calendar systems — while keeping the architecture clean enough to migrate to real APIs easily.

Solution: Built a comprehensive mock layer matching real API response shapes. The function internals change; the tool interfaces don't.

Mock data with deliberate compliance violations
MOCK_FACILITIES = { "FAC-AB-001": { "name": "Edmonton South Terminal", "equipment": [ { "id": "EQ-PUMP-01", "type": "Glycol Pump", "last_calibration": "2023-12-07", # 400+ days ago — non-compliant "criticality": "High" }, # ... additional equipment, some compliant, some not ] } } def mock_api_send_email(to: str, subject: str, body: str) -> dict: """Simulates POST /api/email/send — swap internals to use real SMTP/SES""" entry = { "id": f"EMAIL-{len(EMAIL_OUTBOX) + 1000}", "to": to, "subject": subject, "body": body, "sent_at": datetime.now().isoformat() } EMAIL_OUTBOX.append(entry) return {"status": "sent", "email_id": entry["id"]}

Equipment database

  • 2 facilities, 5 equipment items
  • Deliberate non-compliance scenarios baked in
  • Criticality levels (High / Medium / Low)
  • Calibration date history

Email outbox

  • Full metadata tracked per sent email
  • Email IDs for traceability
  • Timestamp and recipient logging
  • Body content preserved for audit

Calendar system

  • Scheduled tasks with confirmation IDs
  • Due date and priority tracking
  • Assignee support for follow-ups
  • Status updates on completion

Maintenance log

  • Timestamped entries per action
  • Equipment-level granularity
  • Technician assignment fields
  • Full audit trail persistence
Impact: Production-ready architecture with zero infrastructure dependencies — the agent runs on a laptop. Clear migration path to real systems: swap function internals, keep tool interfaces. Immediate deployment and demonstration.
04

Production-Grade UI with Real-Time Agent Transparency

WCAG 2.1 AA compliant Gradio interface showing exactly what the agent is doing

Challenge: Users need to trust an autonomous agent making compliance decisions. That trust requires visibility — seeing which tools were called, in what order, with what results — not just a final answer.

Solution: Custom Gradio design system with a real-time status dashboard tracking emails sent, tasks scheduled, and logs created as the agent works.

  • WCAG 2.1 Level AA compliance — all text meets 4.5:1 contrast ratio for accessibility.
  • Real-time status dashboard — emails sent, tasks scheduled, and logs created update live as the agent works.
  • Tool call transparency — verbose mode output shown so users see exactly what the agent is reasoning about.
  • Quick action buttons — pre-loaded example workflows for immediate onboarding without writing prompts.
  • Responsive layout — adapts from mobile to widescreen without layout breaks.
Impact: Transparency builds the trust required for autonomous AI in compliance contexts. Users don't just see a result — they see a traceable decision path they can audit themselves.

End-to-End Workflow Example

Instruction: "Audit facility FAC-AB-001 for Directive 017 compliance and email results to safety@petolab.com"

Step 1

Knowledge retrieval

  • Calls search_aer_directives
  • Queries RAG for 365-day calibration requirement
  • Retrieves directive citations with chunk references
Step 2

Data collection

  • Calls get_facility_equipment
  • Fetches 4 equipment items from the facility DB
  • Retrieves calibration dates and criticality levels
Step 3

Compliance analysis

  • Calls check_calibration_compliance
  • Identifies 2 items overdue (400 and 380 days)
  • Generates detailed violation report with equipment IDs
Step 4

Report distribution

  • Calls send_compliance_report
  • Emails full report to safety officer
  • Includes specific equipment IDs and days overdue
Step 5

Follow-up planning

  • Calls schedule_follow_up
  • Creates calendar entry for 2-week follow-up
  • Returns confirmation ID
Step 6

Documentation

  • Calls log_maintenance_action (×2)
  • Creates maintenance entry per violation
  • Full audit trail auto-generated
Result: Complete compliance audit in under 30 seconds with 6 tool invocations. Zero manual intervention. Full traceability, email delivery, calendar scheduling, and maintenance documentation — all generated automatically from a single sentence.

Results & Impact

MetricManual processWith agentImprovement
Complete audit time2+ hours< 30 seconds240× faster
Manual steps required10–15 steps0 — fully autonomous100% automation
Report generationManual compilationAuto-generated and emailedInstant delivery
Follow-up schedulingManual calendar entryAutomatic with confirmation IDZero oversight
ConsistencyVaries by analystDeterministic execution100% consistent

Example Use Cases

Single facility audit

"Audit facility FAC-AB-001 for Directive 017 compliance and email results to safety@petolab.com"

  • Autonomous multi-step workflow
  • 6 tool invocations
  • Complete in under 30 seconds

Multi-facility analysis

"Check calibration compliance for all facilities and send an executive summary."

  • Comparative analysis across sites
  • Consolidated reporting
  • Prioritised by criticality level

Knowledge query + action

"What are the calibration requirements in Directive 017? Then check if FAC-AB-001 meets them."

  • RAG query for regulatory requirements
  • Compliance verification against real data
  • Gap analysis and automated reporting

Proactive maintenance

"For each non-compliant item, log a maintenance action and schedule calibration."

  • Automated work order creation
  • Calendar scheduling per item
  • Full documentation trail

Technology Stack

Agent framework

LangChain 0.3.13LangChain-OpenAI 0.2.14GPT-4o (function calling)Pydantic 2.0

RAG system

ChromaDB 0.4.22OpenAI Embeddings (text-embedding-3-small)Vector similarity searchMetadata filtering

Backend & infrastructure

Python 3.9+pdfplumberpython-dotenvMock enterprise APIs

Frontend & UI

Gradio 6.0Custom CSS design systemWCAG 2.1 AA compliantResponsive layout

Engineering Learnings

Function calling is the unlock for agents

LLM function calling enables genuine agentic behaviour. The agent makes autonomous decisions about tool sequencing — this is fundamentally different from hard-coded logic or simple RAG retrieval, and the difference is not subtle in practice.

Tool descriptions are the interface

Clear, detailed docstrings in tool definitions directly determine agent performance. The LLM reads these descriptions to decide which tool to invoke — ambiguous descriptions cause wrong selection more reliably than any other failure mode.

Temperature 0 is mandatory for compliance

Regulatory domains require deterministic behaviour. Temperature 0 ensures consistent tool selection and prevents creative but incorrect responses in compliance-critical applications — the cost is zero and the benefit is enormous.

Mock systems accelerate real architecture

Building comprehensive mock systems enables rapid iteration without enterprise infrastructure. The drop-in design means migration to real APIs is a matter of changing function internals, not restructuring the agent.

Pydantic for input validation is non-negotiable

Pydantic schemas prevent malformed tool calls and provide automatic validation. When LLMs generate function inputs, validation catches errors before execution — silent failures in compliance workflows are unacceptable.

Transparency earns trust in autonomous systems

Real-time status tracking and verbose mode aren't optional for production deployment. Users need to see which tools the agent calls to trust autonomous decision-making in compliance scenarios — showing the work is the product.