privacycompliancerisk

Privacy Impact Assessment Template for Device-Level LLMs (Siri/Gemini + Desktop Agents)

wworkflowapp

2026-01-27

10 min read

Practical PIA for device-level LLMs (Siri/Gemini, Cowork): map data flows, manage third-party processors, and design robust consent and audit controls.

Hook: Why device-level LLMs (Siri/Gemini, Cowork) break traditional PIAs — and what to do about it

Enterprise IT teams in 2026 face a new privacy reality: assistants like Siri powered by Gemini and desktop agents such as Anthropic's Cowork request direct file-system access, persistent background agents, and hybrid cloud calls. That capability accelerates productivity — but it also multiplies privacy risk surface area across device-level storage, local models, cloud processors and third parties. This Privacy Impact Assessment (PIA) template is tailored for device- and desktop-integrated LLMs and gives security, compliance, and engineering teams a practical, auditable blueprint to map data flows, evaluate third-party processors, and design consent scenarios that stand up to GDPR, CPRA and the EU AI Act-era scrutiny.

Executive summary — top actions (start here)

Map every data touchpoint: local files, clipboard, speech transcripts, system metadata, cloud API calls, telemetrics.
Classify processing mode: on-device only, cloud-only, or hybrid (edge inference + cloud for retrieval/fine-tuning).
List all third-party processors: LLM provider, telemetry/analytics, cloud storage, identity provider (IdP), and any SDKs with file access.
Design consent layers: onboarding opt-in, per-feature granular toggles (file access, telemetry, continuous learning), and revocation flows.
Establish technical controls: per-tenant encryption keys, SSO + SCIM for provisioning, SIEM integration, immutable audit logs, and backup encryption.

How to use this PIA template

Use this document as both a template and a checklist. Populate each section with project-specific details, attach an artifact bundle (data-flow diagram, DPA/SCCs, subprocessors list, sample consent UI), and record the assessment score. Keep the PIA versioned and stored in your compliance repository tied to the project lifecycle — updates should be triggered by any model changes, new subprocessors, or feature toggles that alter data flows.

PIA structure — step-by-step

1) Project summary & operational context

Provide a concise description: product name, purpose, intended users, deployment architecture, and timeline. Example fields:

Project: Cowork-Style Desktop Agent (Research Preview)
Purpose: Assist knowledge workers by organizing files, synthesizing documents, generating spreadsheets.
Users: Employees (enterprise), contractors (guest), knowledge workers.
Deployment: Customer-managed desktop app; hybrid inference—local model + cloud LLM calls for reasoning; optional telemetry to vendor.
Owner: Product Security, Privacy Officer, Engineering Lead.

2) Data flow mapping (essential)

Map and document every path data can take. Use diagrams, but also list flows in text so auditors can quickly parse them. Include retention, transformation, and categories (PII, sensitive, non-sensitive).

Minimal data-flow template (textual):

Source: User's local file system (~/Documents/Project-X)
Capture: Desktop agent reads file on user action or with permissioned background scan
Processing: Local parsing (on-device) -> feature extraction -> query to cloud LLM (anonymized context/prompt)
Storage: Local cache (encrypted), cloud storage for persisted sessions (encrypted), backups retained 90 days
Third parties: LLM provider (API), telemetry/analytics provider, identity provider (SSO), cloud object storage

Simple ASCII data-flow (for quick diagrams):

 [User Device: Files, Mic, Clipboard] --> [Desktop Agent (permissioned)] --> {Local Model, Local Cache (enc)}
        |--> [Cloud LLM API] --(responses)--> [Desktop Agent] --> [Local UI]
        |--> [Telemetry] --> [Analytics Processor]
        |--> [Cloud Storage (session backup)]

3) Third-party processors & subprocessors

List all processors and the precise processing they perform. Distinguish between:

Primary LLM providers (e.g., Google Gemini, Anthropic) — specify whether data is used for model training/retention.
Cloud infra (object storage, compute) — region, residency guarantees.
Analytics/telemetry — sampling rate, PI removed?
Identity & access (IdP, SSO) — SAML/OIDC, SCIM provisioning.

For each processor capture:

Legal entity and country
Processing purpose
Data categories processed
Retention period and deletion policy
Whether they train models or store prompts
Contracts in place (DPA, SCCs, ADAs)

Device-level LLMs typically require multiple legal bases: performance of a service, legitimate interests (for limited telemetry), and explicit consent for sensitive data processing or continuous learning. Enterprise customers often prefer consent-like explicit configuration via policy.

Common consent scenarios to capture in the PIA:

Onboarding consent: enable assistant, grant file access, enable microphone.
Feature-level consent: allow cloud calls, save session history to cloud, allow vendor to use data to improve models.
Admin controls for enterprises: SSO enforced, telemetry disallowed, model updates blocked.

Sample concise consent text (UI copy):

"Allow Cowork to read files you select and use synthesized data to generate responses. You can choose Local Only (no cloud calls) or Hybrid (local + cloud). Cloud processing will be encrypted in transit and at rest. You may revoke access anytime in Settings."

Consent capture example (JSON record):

{
  "userId": "alice@example.com",
  "consentVersion": "2026-01-01",
  "scopes": {
    "file_access": true,
    "mic_recording": false,
    "cloud_processing": "hybrid",
    "model_improvement": false
  },
  "timestamp": "2026-01-18T12:34:56Z"
}

Server-side endpoint pseudocode to persist consent (Node.js/Express):

app.post('/api/consent', authenticate(), async (req, res) => {
  const consent = req.body;
  // validate schema, store with immutable timestamp, link to user identity
  await db.insert('consents', consent);
  res.status(200).send({ ok: true });
});

5) Risk assessment & scoring (practical matrix)

Use a simple risk score (1-5) for Likelihood and Impact, then compute risk = Likelihood x Impact (1-25). Prioritize anything above 12.

High risk examples: Unrestricted file-system background scanning; cloud storage of sensitive corporate IP without contractual restrictions; third-party training on raw prompts.
Medium risk examples: Telemetry sampling that might contain contextual PII; hybrid inference sending contextual snippets to cloud LLMs.
Low risk examples: On-device-only inference with no telemetry and strong OS sandboxing.

Sample mitigation mapping:

Risk: Desktop agent sends raw docs to cloud LLM (Impact 5, Likelihood 4 = 20) —> Mitigations: prompt sanitization, minima tokens, encryption-in-flight, contractual prohibition on training, DPA clause.
Risk: Telemetry contains email addresses (Impact 4, Likelihood 3 = 12) —> Mitigations: client-side redaction, hashing, sampling, pseudonymization.

6) Technical & organizational measures (TO/PS)

Match measures to risk and regulatory requirements. Below are prescriptive controls for device-LLMs in 2026.

Identity and access

Require SSO (SAML/OIDC) for enterprise deployments; enforce MFA via IdP.
Use SCIM for provisioning and deprovisioning; automate offboarding.
Role-based access control for admin and agent settings; per-feature admin lockdown.

Data protection

Encrypt local caches and backups at rest (AES-256) and in transit ( TLS 1.3 ).
Per-tenant keys with KMS-backed envelopes; use customer-managed keys (CMKs) when required.
Minimize data sent to cloud: use extractive prompts, redact or hash PII before sending.

Multi-tenant & scaling

Logical tenant isolation in databases (row/column-level encryption or separate schemas).
Per-tenant rate limits and quotas to prevent cross-tenant inference contamination.
Automated provisioning pipelines that enforce tenant-specific privacy defaults (telemetry off by default for regulated industries).

Backups & retention

Encrypted backups with retention policy aligned to business/contractual requirements (default 90 days for session history unless customer elects longer). Consider cloud infra choices and residency guarantees during vendor selection.
Tested deletion workflows: object lifecycle + tombstone + periodic verification to prove deletion requests were honored.

Observability & audit

SIEM integration for security events (failed authorize, unexpected network calls to unknown domains).
Immutable audit logs for consent changes, admin actions, and processor onboarding.

7) Audit, monitoring & evidence for regulators

Audits in 2026 expect evidence, not just claims. Prepare:

Data flow diagrams and change history
DPA and subprocessors register with timestamps and SCCs where necessary
Sample logs demonstrating consent capture and revocation
Pen-test and model safety evaluation reports (including adversarial prompt testing)
Privacy-by-design decisions and threat models

8) Data subject rights: practical flows

Implement end-to-end flows for:

Access — export sessions and prompts in machine-readable form
Rectification — allow users to correct metadata or session notes
Deletion — remove local caches, cloud sessions, and instruct processors to delete (log proof)
Portability — provide JSON/CSV and an option to transfer sessions to a new service

9) Incident response & breach notification

Define playbooks specific to device-level agents:

Rapid isolation: remote configuration to disable cloud calls and telemetry.
Forensic capture: collect agent logs, network captures, and affected device identifiers while preserving user privacy.
Notification thresholds: align to GDPR 72-hour rule and local sectoral laws; provide clear remediation steps and evidence logs.

10) Deployment checklist for device/desktop LLMs

Confirm consent UI and immutable consent records implemented.
Validate per-feature admin toggles and tenant default policies.
Verify encryption keys and CMK config for each tenant.
Test data minimization pipeline (client-side redaction).
Confirm DPA and subprocessors documented; sign SCCs as applicable.
Run integration tests that assert local-only mode does not reach cloud APIs.
Document and store the PIA artifact set in compliance repo, with version and sign-off by Privacy Officer.

11) Example PIA summary (two short case studies)

Case: Cowork-style desktop agent (enterprise preview)

Findings: High risk if default background scanning is enabled. Third-party LLM provider accepts prompts and policy must prohibit training on customer prompts. Mitigations: background scanning disabled by default, explicit per-folder opt-in, prompt truncation and client-side PII redaction, signed DPA with vendor, telemetry sampling disabled for finance customers. Residual risk: medium; require quarterly review.

Case: Siri using Gemini hybrid model in enterprise-managed devices

Findings: Hybrid design sends summarized context to Gemini for complex queries. Risks center on contextual leakage and cross-domain indexing. Mitigations: enforce enterprise-only Gemini endpoint with contractual processing limits, use SCIM to limit which users can enable advanced features, maintain per-tenant keying, and provide audit trail of all prompts sent to cloud. Residual risk: low-to-medium when safeguards enabled; monitor for prompt-privacy regressions post update.

Practical artifacts to attach to the PIA

Data Flow Diagram (DFD) per OWASP or STRIDE
Processor Register (name, purpose, country, DPA signed)
Consent UI screenshots & text versions
Consent JSON schema and sample records
Retention & deletion SOPs
Risk scoring spreadsheet and remediation timeline

Actionable takeaways (what to implement this week)

Run a rapid data-flow discovery on one desktop agent to find all file access APIs it calls.
Turn off background scanning and set per-folder opt-in by default.
Require SSO for enterprise installs and enforce telemetry opt-out for regulated tenants.
Ask each LLM provider in writing whether they retain prompts and for what purposes; get that in the DPA.
Create immutable consent records and expose an admin revocation API for customer security teams.

Why this matters in 2026 — trends & future predictions

Late 2025 and early 2026 saw major shifts: Apple’s integration of Google’s Gemini into Siri and Anthropic’s Cowork preview pushed models from servers into intimate device contexts. Regulators are now combining AI-specific obligations with traditional data protection frameworks: expect more frequent audits, higher standards for demonstrable data minimization, and restrictions on vendors training models with customer prompts. Architectures that fail to isolate tenant data or lack per-tenant keying will be increasingly costly to remediate.

In short: device access = productivity. Device access without containment = regulatory and reputational risk.

Final recommendations & call-to-action

Start this PIA now. Use the checklist above, attach the required artifacts, and schedule a cross-functional review (Privacy, Legal, Security, Product, and Engineering). For enterprise buyers evaluating device-level LLM vendors, require the following before procurement:

Signed DPA with explicit non-training clause for prompts unless customer consents
Subprocessor list with 30-day notice and right to object
Support for customer-managed keys and regional data residency
Enterprise admin controls to disable cloud processing and telemetry

If you want a pre-filled PIA package tailored to your architecture (Siri/Gemini integrations, Cowork-style agents, or custom desktop LLMs), request our template bundle: it includes a model DFD, consent JSON schema, processor register CSV, and remediation roadmap you can drop into your compliance repo. Click the link on this page to download the bundle or contact our team for a workshop to run this PIA in a day.

workflowapp

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.