Featured Project · Agentic AI · Alberta Energy Sector
Autonomous AI agent for industrial compliance auditing — 7-tool orchestration with LangChain + GPT-4o, RAG over Alberta Energy Regulator directives, and zero-manual-step audit workflows completing in under 30 seconds.
Traditional RAG systems answer questions. This agent performs work. The shift from passive knowledge retrieval to active operational execution is what makes this genuinely useful in an industrial compliance context — and genuinely interesting as an engineering problem.
The predecessor to this project was a RAG system that could answer questions about AER directives. That's useful — but a compliance officer doesn't just need answers, they need the audit done, the report sent, the follow-up scheduled, and the maintenance logged. That's what an agent does.
From "tell me about compliance requirements" to "audit this facility and handle it" — the agent autonomously orchestrates tools to achieve a goal, not just retrieve information.
The LLM acts as the decision-maker, choosing which tools to call and in what sequence. No hardcoded workflows — the agent reasons about the best path to the goal.
Not just information retrieval — sends emails, schedules follow-up tasks, logs maintenance actions, and generates compliance reports. Measurable outputs, not just text.
Designed for extensibility — adding new tools, integrating real APIs, and scaling to enterprise systems requires minimal changes to the core agent structure.
Natural language instruction
"Audit FAC-AB-001 for Directive 017 compliance and email results to safety@petolab.com"
LangChain Tool Calling Agent
Decides which tools to invoke and in what sequence — no hardcoded decision trees
Knowledge · Analysis · Action
Mock enterprise APIs
Equipment DB · Email outbox · Calendar · Maintenance log — all drop-in replaceable
Complete workflow execution
Audit report + email delivery + calendar scheduling + maintenance logs — all in under 30 seconds
LangChain Agent with GPT-4o Function Calling
LLM-driven tool selection — no hardcoded decision trees, no rule-based routing
Challenge: Enable an LLM to autonomously decide which tools to use, in what order, and when to stop — without pre-programmed workflows that break on any input variation.
Solution: LangChain's Tool Calling Agent with GPT-4o's native function calling capability. The agent receives a goal, reasons about it, selects tools, processes results, and iterates until the goal is achieved.
Agent setupfrom langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
class AERComplianceAgent:
def __init__(self, api_key: str, model: str = "gpt-4o"):
self.llm = ChatOpenAI(
model=model,
temperature=0, # Deterministic — critical for compliance work
api_key=api_key
)
self.agent = create_tool_calling_agent(
self.llm,
audit_tools, # 7 available tools
self.prompt # Custom system prompt for regulatory context
)
self.agent_executor = AgentExecutor(
agent=self.agent,
tools=audit_tools,
verbose=True, # Full transparency on tool calls
max_iterations=15, # Cost cap — prevents runaway execution
handle_parsing_errors=True
)Tool Design with Pydantic Schemas
Type-safe, self-documenting tools the LLM can understand and invoke correctly
Challenge: The LLM reads tool docstrings to decide which tool to invoke. Ambiguous or incomplete descriptions cause wrong tool selection. Unvalidated inputs cause silent failures downstream.
Solution: Every tool defined with Pydantic input schemas, descriptive docstrings explaining exact behaviour, and explicit return format documentation. The description is the interface.
Tool definition with Pydantic validationfrom langchain.tools import tool
from pydantic import BaseModel, Field
class FacilityInput(BaseModel):
facility_id: str = Field(description="Facility ID to audit (e.g. 'FAC-AB-001')")
@tool
def check_calibration_compliance(facility_id: str) -> str:
"""
Checks all equipment calibration dates against the 365-day requirement
in AER Directive 017. Returns a detailed list of non-compliant items
including equipment ID, days overdue, and criticality level.
Use this after get_facility_equipment to perform a compliance audit.
"""
equipment = mock_db.fetch_equipment(facility_id)
non_compliant = []
for item in equipment:
days_since = (datetime.now() - item['last_calibration']).days
if days_since > 365:
non_compliant.append({
'id': item['id'],
'days_overdue': days_since - 365,
'criticality': item.get('criticality', 'Unknown')
})
return format_compliance_report(non_compliant)search_aer_directives — RAG query over directive PDFslist_facilities — enumerate registered locationsget_facility_equipment — fetch from equipment DBcheck_calibration_compliance — audit against Directive 017send_compliance_report — email to stakeholdersschedule_follow_up — calendar entry creationlog_maintenance_action — audit trail loggingMock Enterprise System Architecture
Realistic APIs without infrastructure dependency — drop-in replaceable with real systems
Challenge: Demonstrate a complete enterprise agentic workflow without requiring actual equipment databases, email servers, or calendar systems — while keeping the architecture clean enough to migrate to real APIs easily.
Solution: Built a comprehensive mock layer matching real API response shapes. The function internals change; the tool interfaces don't.
Mock data with deliberate compliance violationsMOCK_FACILITIES = {
"FAC-AB-001": {
"name": "Edmonton South Terminal",
"equipment": [
{
"id": "EQ-PUMP-01",
"type": "Glycol Pump",
"last_calibration": "2023-12-07", # 400+ days ago — non-compliant
"criticality": "High"
},
# ... additional equipment, some compliant, some not
]
}
}
def mock_api_send_email(to: str, subject: str, body: str) -> dict:
"""Simulates POST /api/email/send — swap internals to use real SMTP/SES"""
entry = {
"id": f"EMAIL-{len(EMAIL_OUTBOX) + 1000}",
"to": to, "subject": subject, "body": body,
"sent_at": datetime.now().isoformat()
}
EMAIL_OUTBOX.append(entry)
return {"status": "sent", "email_id": entry["id"]}Production-Grade UI with Real-Time Agent Transparency
WCAG 2.1 AA compliant Gradio interface showing exactly what the agent is doing
Challenge: Users need to trust an autonomous agent making compliance decisions. That trust requires visibility — seeing which tools were called, in what order, with what results — not just a final answer.
Solution: Custom Gradio design system with a real-time status dashboard tracking emails sent, tasks scheduled, and logs created as the agent works.
Instruction: "Audit facility FAC-AB-001 for Directive 017 compliance and email results to safety@petolab.com"
search_aer_directivesget_facility_equipmentcheck_calibration_compliancesend_compliance_reportschedule_follow_uplog_maintenance_action (×2)"Audit facility FAC-AB-001 for Directive 017 compliance and email results to safety@petolab.com"
"Check calibration compliance for all facilities and send an executive summary."
"What are the calibration requirements in Directive 017? Then check if FAC-AB-001 meets them."
"For each non-compliant item, log a maintenance action and schedule calibration."
Agent framework
RAG system
Backend & infrastructure
Frontend & UI
LLM function calling enables genuine agentic behaviour. The agent makes autonomous decisions about tool sequencing — this is fundamentally different from hard-coded logic or simple RAG retrieval, and the difference is not subtle in practice.
Clear, detailed docstrings in tool definitions directly determine agent performance. The LLM reads these descriptions to decide which tool to invoke — ambiguous descriptions cause wrong selection more reliably than any other failure mode.
Regulatory domains require deterministic behaviour. Temperature 0 ensures consistent tool selection and prevents creative but incorrect responses in compliance-critical applications — the cost is zero and the benefit is enormous.
Building comprehensive mock systems enables rapid iteration without enterprise infrastructure. The drop-in design means migration to real APIs is a matter of changing function internals, not restructuring the agent.
Pydantic schemas prevent malformed tool calls and provide automatic validation. When LLMs generate function inputs, validation catches errors before execution — silent failures in compliance workflows are unacceptable.
Real-time status tracking and verbose mode aren't optional for production deployment. Users need to see which tools the agent calls to trust autonomous decision-making in compliance scenarios — showing the work is the product.