Developer Sandbox: Build a Travel Booking Agent Demo

Hands-on tutorial to build a safe, constrained travel booking agent using public APIs—complete with safety checks, human approval, and a test harness.

Hook: Stop losing hours to context switching — build a safe, testable travel booking agent

If your team is still juggling multiple travel sites, copy-pasting itineraries, and manually validating bookings, you're paying in time and risk. This tutorial walks experienced developers and IT admins through building a constrained agent—a Qwen-style, tool-enabled assistant—that searches and books travel using public travel APIs while enforcing safety checks, human approvals, and a developer-friendly test harness.

Why this matters in 2026

Agentic assistants matured in 2024–2025; in late 2025 Alibaba expanded Qwen with agentic features that perform real-world tasks like booking travel. In early 2026 we saw the AI landscape move from pure chat to safe, tool-augmented agents integrated into enterprise workflows. At the same time, regulatory scrutiny (for example, EU AI Act enforcement ramping up in 2026) and security concerns make it essential to build constrained, auditable agents that can be tested and verified.

What you'll build

By the end you'll have a reproducible pattern and sample code to implement a constrained booking agent that:

Searches flights and hotels via public/sandbox travel APIs (Amadeus, Skyscanner, or a mocking proxy)
Applies corporate travel policy and spend constraints
Requires human approval for high-risk actions
Provides a test harness for unit, contract, and end-to-end tests
Logs auditable events and metrics for compliance

Architecture overview

Keep the architecture simple and modular. The core components:

LLM Controller – Receives user intent and decides which tool to call.
Tool Adapters – Thin wrappers around travel APIs: searchFlights, getFareRules, holdBooking, confirmBooking.
Policy Engine – Enforces constraints (max spend, allowed carriers, timeframe).
Human Approval Flow – UI or alerting for approval before confirmBooking.
Auditable Store – Immutable event log (append-only) for compliance. See guidance on architecting audit trails and secure marketplaces for patterns you can borrow.

Prerequisites

Node.js 18+ or Python 3.11+
Accounts for public travel APIs (use sandbox keys where possible): Amadeus for Developers, Skyscanner Partners, or vendor sandbox endpoints
Secrets store (Vault, AWS Secrets Manager, or environment-encrypted secrets) — consider hardened workflows and reviews like those outlined in the TitanVault & SeedVault review for inspiration.
Docker and a CI pipeline (GitHub Actions/GitLab CI) for sandboxed tests

Design principles

Constrain WHAT the agent can do — only expose deterministic tool functions with strict schemas.
Limit WHO can approve — role-based approvals for booking confirmations.
Fail safe — never call confirmBooking in production without human approval in enforced workflows.
Testable — mock providers and deterministic LLM reply simulations for unit tests.
Audited — log every tool call, decision, and approval.

Step 1 — Implement tool adapters (example: Node.js)

Tool adapters translate the agent's intent into API calls. Keep them pure and return structured results. Below is a minimal searchFlights adapter against a hypothetical public API (replace with your vendor's SDK).

// src/tools/searchFlights.js
const fetch = require('node-fetch');

async function searchFlights({ origin, destination, departDate, returnDate, passengers, apiKey, sandbox = true }) {
  const base = sandbox ? 'https://sandbox.travelapi.example.com' : 'https://api.travelapi.example.com';
  const url = `${base}/v1/flights/search?origin=${origin}&destination=${destination}&depart=${departDate}&return=${returnDate}&pax=${passengers}`;
  const res = await fetch(url, { headers: { 'Authorization': `Bearer ${apiKey}` } });
  if (!res.ok) throw new Error(`API error ${res.status}`);
  const body = await res.json();
  // Normalize to a deterministic schema
  return body.results.map(r => ({
    id: r.offerId,
    price: r.totalPrice, // USD
    carrier: r.carrierCode,
    legs: r.legs,
  }));
}

module.exports = { searchFlights };

Function schema for LLM function-calling

{
  "name": "searchFlights",
  "description": "Search available flights between two airports",
  "parameters": {
    "type": "object",
    "properties": {
      "origin": { "type": "string" },
      "destination": { "type": "string" },
      "departDate": { "type": "string", "format": "date" },
      "returnDate": { "type": ["string", "null"], "format": "date" },
      "passengers": { "type": "integer", "minimum": 1 }
    },
    "required": ["origin", "destination", "departDate"]
  }
}

Step 2 — Constrain agent behavior (prompt + policy enforcement)

Don't let the LLM decide free-form. Use a constrained prompt and a policy engine to validate decisions before executing side effects.

// promptTemplate (pseudo)
System: You are a constrained travel booking agent. Use only registered tools. Never call confirmBooking without approval.
User: "Book me a flight from SFO to LHR on 2026-03-05, economy, under $1500."

Implement a policy check that validates selected offers against corporate rules:

function enforcePolicy(offer, policy) {
  if (offer.price > policy.maxSpend) return { allowed: false, reason: 'Exceeded spend limit' };
  if (!policy.allowedCarriers.includes(offer.carrier)) return { allowed: false, reason: 'Carrier not allowed' };
  // other checks: layover duration, total travel time
  return { allowed: true };
}

Step 3 — Human-in-the-loop and approval workflow

Always require a human reviewer for financial or high-risk actions. The agent should present a compact booking summary and a secure approval link (or Slack/Teams message) to an authorized approver.

// approval flow (high-level)
1. Agent collects options and recommends best offer.
2. Agent calls policy engine; if allowed and below auto-approve threshold, proceed to hold.
3. If above threshold or policy requires, create ApprovalRequest in DB and notify approver.
4. Upon approval, agent confirms booking.

Step 4 — Mocking and sandboxing: don't hit production APIs in tests

Create a local mocking proxy or use the provider's sandbox. The goal: deterministic responses for the LLM + tools, so tests are deterministic and reproducible. If you're experimenting with on-prem safety or local LLMs, projects like the Raspberry Pi local LLM lab show how to iterate without production dependencies.

Example: local fixture for searchFlights

// tests/fixtures/searchFlights-response.json
{
  "results": [
    { "offerId": "OF123", "totalPrice": 1200, "carrierCode": "AA", "legs": [] },
    { "offerId": "OF124", "totalPrice": 980, "carrierCode": "BA", "legs": [] }
  ]
}

Step 5 — Build a test harness

Use layered testing: unit tests for tool adapters, contract tests for API shapes, and end-to-end tests for agent flows. Below are sample tests in Node.js using Jest.

// tests/searchFlights.test.js
const { searchFlights } = require('../src/tools/searchFlights');
const nock = require('nock');

test('searchFlights returns normalized offers', async () => {
  nock('https://sandbox.travelapi.example.com')
    .get(//v1/flights/search/)
    .reply(200, require('./fixtures/searchFlights-response.json'));

  const offers = await searchFlights({ origin: 'SFO', destination: 'LHR', departDate: '2026-03-05', passengers: 1, apiKey: 'test' });
  expect(Array.isArray(offers)).toBe(true);
  expect(offers[0]).toHaveProperty('id');
  expect(offers[0]).toHaveProperty('price');
});

For LLM-driven flows, don't call the real model in unit tests. Instead, stub the LLM-controller to return deterministic tool-calling traces (simulate function-calling responses).

Step 6 — Contract and integration testing

Use contract testing (Pact or similar) for adapters to ensure vendor API changes are caught early. Add CI gates that run contracts against the vendor's sandbox every night. For integration and billing/audit patterns, see architecting a paid-data marketplace for ideas on audit trails and model accountability.

Step 7 — Observability, metrics and alerts

Track these metrics:

agent.booking_attempts_total
agent.booking_success_total
agent.approval_rate
api.travel.latency_seconds{vendor}
policy.violations_total

Export traces (OpenTelemetry) and create dashboards and alerts on error rate and approval latency — outages and high error rates can have real business costs, see analysis on outage impact for how important alerting is: Cost Impact Analysis: Quantifying Business Loss from Social Platform and CDN Outages.

Step 8 — Security & compliance checklist

Secrets – store API keys encrypted and rotate them regularly. For hardened secret workflows and vaulting, review the TitanVault workflows writeups: TitanVault Pro & SeedVault.
Principle of least privilege – split API keys: read-only for search, separate key for booking holds. Follow vendor and DB ACL patterns like those described in Mongoose.Cloud security guidance.
Audit log – append-only store with signer or hash chain for tamper evidence. See patterns in marketplace and data-audit architectures for inspiration.
Data minimization – never persist unnecessary PII; store only tokenized payment placeholders.
Human approvals – require MFA for approvers when confirming bookings. Protect approver channels and use secure notification links.

Advanced strategies (2026 trends)

Leverage these modern patterns that emerged in 2025–2026:

Tool selection policies: use usage telemetry to prefer low-latency vendors.
RAG for policies: store corporate travel policies in a vector DB and use retrieval-augmented prompts to make decisions auditable. See analytics & edge personalization patterns: Edge Signals & Personalization.
Multi-agent choreography: separate agents for search, payments, and notifications, coordinated by a conductor to increase resiliency.
Local LLM safety layer: run a small on-prem safety model to pre-filter LLM outputs before exposing to the orchestrator (helpful under strict compliance regimes introduced in 2026). If you need inexpensive local LLM experimentation, a low-cost Raspberry Pi lab can help you iterate safely: Raspberry Pi 5 + AI HAT+ 2.

Example end-to-end flow (putting it together)

User: "Find me SFO→LHR, 2026-03-05, economy, under $1500."
LLM Controller calls searchFlights tool (structured JSON call).
Tool adapter returns offers (sandbox fixtures during tests).
Policy engine validates best offer; it's allowed but above auto-approve threshold.
Agent creates ApprovalRequest and notifies approver via Slack with secure link.
Approver reviews, approves. Approval triggers confirmBooking tool which uses a booking-key restricted API key.
Event is logged; user is notified with itinerary and audit reference.

Test harness: sample e2e simulation strategy

For reliable E2E tests, simulate both LLM responses and vendor API responses. Build replay fixtures for the LLM’s function-calling JSON. This lets you verify the orchestration logic without variability.

// tests/e2e-simulated.test.js (pseudo)
1. Seed DB with ApprovalUsers and policies.
2. Stub LLM controller to return function calls for searchFlights -> chooseOffer -> requestApproval.
3. Nock vendor endpoints to return fixtures.
4. Run orchestration; assert ApprovalRequest created and notification sent.
5. Simulate approval webhook; assert confirmBooking called and DB updated.

Common pitfalls and how to avoid them

Calling confirmBooking directly — prevent via both prompt constraints and runtime permission checks.
Non-deterministic tests — always mock the LLM and external APIs in CI.
Lack of auditability — choose an append-only store and hash chaining if your compliance team requires it. See patterns in paid-data marketplaces for well-formed audit trails: architecting paid-data marketplaces.
Hard-coded policy values — keep policies in a central store and use versioning for traceability.

Minimal reproducible repo (quick checklist)

src/tools/* - adapters with sandbox support
src/llm-controller - deterministic wrapper or stub for function-calling outputs
src/policy - policy engine (JSON-based)
tests/fixtures - vendor responses and LLM traces
CI workflow - run unit + contract + e2e-simulated tests on PRs

Small case study: internal travel bot (fictional)

AcmeCorp replaced manual travel desk workflows with a constrained agent in 2025. Results within six months:

Average booking time reduced from 45 minutes to 8 minutes
Policy violations dropped 87%
Audit-ready bookings increased from 10% to 100%

These gains were possible because the team invested in strong testing, human approval design, and vendor sandboxing before production rollout.

Pro tip: Treat the agent as just another API-backed microservice. Apply the same CI, testing, and security controls you'd use for any critical internal service.

Final checklist before production

All tool adapters covered by unit tests and contract tests
Approval flow tested and role-based access enforced
Secrets rotated and scoped — consider vault and key management best-practices like those in the TitanVault writeup.
Observability and alerts configured (track approval latency and policy violations)
Compliance audit trail enabled with retention policy

Takeaways (actionable)

Start small: sandbox, mock LLM outputs, and enable human approvals.
Constrain broadly: rely on JSON function schemas and runtime policy checks.
Test deeply: unit, contract, and e2e-simulated tests are mandatory.
Monitor aggressively: track booking success, approval latency, and policy violations — outages are expensive, plan monitoring accordingly: outage cost analysis.

Why this approach aligns with 2026 enterprise needs

In 2026, organizations prioritize demonstrable safety, auditable decisions, and measurable ROI from agentic automation. The constrained agent pattern outlined here mirrors industry direction seen in Qwen's agentic features and broader moves toward controlled, integrated assistants. It balances automation with governance — exactly what regulated enterprises need today.

Next steps — get the repo and try the demo

Ready to run the demo? Clone the starter repo (link in your internal resources or to the open-source sandbox we publish) and follow the README to boot the sandbox API, seed fixtures, and run the test harness. Use the sandbox keys and run CI with the simulated LLM traces before connecting any production credentials.

Call to action

Try it now: Spin up the sandbox, run the test harness, and reach out to the workflowapp.cloud team for an enterprise review of your policy rules and deployment plan. We’ll help you move from proof-of-concept to a secure, auditable booking agent with measurable ROI.

Developer Sandbox: Building an Agent That Books Travel Using Public APIs (Qwen-style Demo)

Hook: Stop losing hours to context switching — build a safe, testable travel booking agent

Why this matters in 2026

What you'll build

Architecture overview

Prerequisites

Design principles

Step 1 — Implement tool adapters (example: Node.js)

Function schema for LLM function-calling

Step 2 — Constrain agent behavior (prompt + policy enforcement)

Step 3 — Human-in-the-loop and approval workflow

Step 4 — Mocking and sandboxing: don't hit production APIs in tests

Example: local fixture for searchFlights

Step 5 — Build a test harness

Step 6 — Contract and integration testing

Step 7 — Observability, metrics and alerts

Step 8 — Security & compliance checklist

Advanced strategies (2026 trends)

Example end-to-end flow (putting it together)

Test harness: sample e2e simulation strategy

Common pitfalls and how to avoid them

Minimal reproducible repo (quick checklist)

Small case study: internal travel bot (fictional)

Final checklist before production

Takeaways (actionable)

Why this approach aligns with 2026 enterprise needs

Next steps — get the repo and try the demo

Call to action

Related Topics

workflowapp

Up Next

Best Approval Workflow Software for Finance, HR, and Operations

Best Workflow Builders With API and Webhook Support

n8n Self-Hosted vs Cloud: Cost, Control, and Maintenance Tradeoffs

Hook: Stop losing hours to context switching — build a safe, testable travel booking agent

Why this matters in 2026

What you'll build

Architecture overview

Prerequisites

Design principles

Step 1 — Implement tool adapters (example: Node.js)

Function schema for LLM function-calling

Step 2 — Constrain agent behavior (prompt + policy enforcement)

Step 3 — Human-in-the-loop and approval workflow

Step 4 — Mocking and sandboxing: don't hit production APIs in tests

Example: local fixture for searchFlights

Step 5 — Build a test harness

Step 6 — Contract and integration testing

Step 7 — Observability, metrics and alerts

Step 8 — Security & compliance checklist

Advanced strategies (2026 trends)

Example end-to-end flow (putting it together)

Test harness: sample e2e simulation strategy

Common pitfalls and how to avoid them

Minimal reproducible repo (quick checklist)

Small case study: internal travel bot (fictional)

Final checklist before production

Takeaways (actionable)

Why this approach aligns with 2026 enterprise needs

Next steps — get the repo and try the demo

Call to action

Related Reading

Related Topics

workflowapp

Up Next

Best Approval Workflow Software for Finance, HR, and Operations

Best Workflow Builders With API and Webhook Support

n8n Self-Hosted vs Cloud: Cost, Control, and Maintenance Tradeoffs