automationAIquickstart

Rapid Prototyping with Autonomous Agents: Build a Desktop Assistant That Automates Repetitive Dev Tasks

nnewservice

2026-02-01

10 min read

Prototype a safe desktop autonomous agent that automates branch checks and changelogs using local APIs, manifest-based permissions, and human approvals.

Build a safe, autonomous desktop agent to automate repetitive dev tasks — fast

Pain point: You and your team waste hours on branch hygiene, changelog drafting, and repetitive repository maintenance. You want automation without handing an AI full access to your desktop, secrets, or CI. This tutorial shows how to prototype a desktop autonomous agent (in the spirit of Cowork) that automates branch checks and changelog generation while keeping safety controls, local APIs, and human approvals front and center.

Why build this in 2026?

Late 2025 and early 2026 accelerated two trends: more capable agentic LLMs and a wave of desktop-first assistants (Anthropic's Cowork research preview is an example). At the same time, enterprises demand predictable security and governance. The result: practical patterns for developer automation using local APIs, ephemeral credentials, and explicit permission manifests.

What you'll get from this tutorial

A minimal architecture to prototype a desktop agent that runs locally.
Code examples for a git-checker and a changelog generator (Node/Python).
Concrete safety controls: permission manifest, sandboxing, approval flows.
Hardening and production next steps for teams.

High-level architecture

Keep the design minimal and auditable. The prototype uses four layers:

Desktop UI — Electron or native UI: displays actions, gets approvals.
Agent Kernel — stateless process that queries an LLM to plan and author text (changelogs).
Tool Adapters — deterministic scripts that perform operations (git status, commit parsing, push, PR creation).
Safety Layer — permission manifest, audit log, approval prompt, and sandbox boundary.

Key principle: keep the agent's generative reasoning separate from side-effecting tools. LLMs plan and format; tool adapters execute and always pass safety checks first.

Prerequisites

Node.js 18+ (for the Electron UI and local API gateway)
Python 3.10+ (for small agent tooling and wrappers) — optional, you can keep everything in Node
Git CLI installed
LLM API access (Anthropic Claude or OpenAI in 2026) or a local LLM runtime
Basic familiarity with Docker/Podman for sandboxing

Step 1 — Project scaffold

Create a folder and initialize a minimal repo:

mkdir ai-desktop-agent
cd ai-desktop-agent
git init
npm init -y

Add two folders: /ui for the Electron UI and /agent for the backend agent and tool adapters.

Step 2 — Define the permission manifest

Start with a small declarative manifest that lists exactly what the agent can do. This file is audited and shown in the UI when the user installs the agent.

---
name: dev-helper-agent
version: 0.1.0
permissions:
  - git:read
  - git:branch-check
  - changelog:generate
  - network:pull-request:create
whitelist_paths:
  - "./repos/*"
user_approval_required_for:
  - git:push
  - network:pull-request:create
rate_limits:
  api_calls_per_minute: 20

Why this matters: the manifest enforces least privilege. The agent must never assume irrevocable rights (like pushing to remote) without explicit user consent.

Step 3 — Implement tool adapters

Tool adapters are small, deterministic programs. They are the only processes that change state. Below are two: git-checker and changelog-generator.

git-checker (Node example)

const { execSync } = require('child_process')

function getBranchStatus(repoPath = '.') {
  const out = execSync('git -C ' + repoPath + ' status --porcelain=v1 -b', { encoding: 'utf8' })
  return out
}

function listBranches(repoPath = '.') {
  const out = execSync('git -C ' + repoPath + ' branch --format="%(refname:short)"', { encoding: 'utf8' })
  return out.split('\n').filter(Boolean)
}

module.exports = { getBranchStatus, listBranches }

changelog-generator (Python + LLM)

This adapter queries the LLM to format commit messages into a conventional changelog. The adapter passes the commit list and a strict instruction to the model to avoid running commands.

import subprocess
import os

LLM_API_KEY = os.environ.get('LLM_API_KEY')

def list_commits(repo_path='.', since_tag='v0.1.0'):
    cmd = ['git', '-C', repo_path, 'log', f'{since_tag}..HEAD', '--pretty=%h %s (%an)']
    out = subprocess.check_output(cmd, encoding='utf8')
    return out.strip().splitlines()

def generate_changelog(commits):
    prompt = (
        'You are a strict formatter. Given the following commits, produce a Markdown changelog grouped by type (Added, Changed, Fixed). '
        'Do not propose or run any shell commands. Output only the changelog.'
    )
    # Call your LLM here. Placeholder pseudo-call:
    llm_response = call_llm_api(prompt + '\n\n' + '\n'.join(commits))
    return llm_response

# call_llm_api should be implemented using your provider's SDK

Important: tool adapters must validate inputs and never accept arbitrary code from the LLM. They should sanitize file paths and refuse operations outside whitelisted paths.

Step 4 — Agent kernel: plan, propose, request

The agent kernel is an orchestrator that follows a simple loop:

Observe — call local read-only tools (git status, branch list).
Plan — ask the LLM to propose actions (e.g., "generate changelog draft for commits since v0.1.0").
Review — run static safety checks on the plan, compare against manifest.
Request Approval — if the plan includes side effects, prompt the user with diffs and an approval UI.
Execute — run tool adapters for approved actions and record audit log.

Pseudocode for the loop

# Pseudocode
observe = tools.git.get_status(repo)
plan = llm.ask("Given the status, propose a list of actions limited to changelog generation and branch checks.")
if not policy.allows(plan):
    reject
if plan.includes_side_effects():
    user_approval = ui.prompt_for_approval(plan.summary)
    if not user_approval:
        abort
execute(plan)
log.audit(plan, result)

Step 5 — Safe local API surface

Expose a minimal localhost API (Express/FastAPI). Bind to 127.0.0.1 only. Use short-lived tokens created by the UI and stored in an OS credential store. Here is an example endpoint list:

GET /tools/git/status — read-only
GET /tools/git/branches — read-only
POST /tools/changelog/generate — returns a changelog draft (no pushes)
POST /actions/execute — executes an approved action only if signed by the UI

Security checklist for the API:

Bind to loopback (127.0.0.1).
Require an ephemeral, per-session token stored in the OS keychain.
Whitelists: restrict repo paths that the API will operate on.
Audit: every action logged locally (signed entries) and optionally shipped to your observability backend.

Step 6 — Human-in-the-loop approval UI

Design the UI around confirmation and transparency. For any operation that mutates state (git push, open PR, create files):

Show the diff or the proposed PR body.
Show which manifest permission permits the action.
Highlight side effects (network calls, writes outside whitelist).
Require an explicit, time-limited approval (e.g., "Approve for 10 minutes").

Example: generate changelog and propose PR

Walkthrough of how the agent operates against a repo:

Agent calls /tools/git/branches and sees multiple feature branches and a release tag v1.2.0.
Agent lists commits since v1.2.0 and asks the LLM to produce a changelog draft.
Agent places the draft into a new file CHANGELOG-draft.md and displays the diff in UI.
User reviews and approves the PR creation. The UI signs the execution request, and /actions/execute runs a tool adapter to create a branch, commit the changelog, and open a pull request using the configured remote token.

# Example approval payload (signed by UI)
{
  "action": "create_pr",
  "repo": "./repos/my-app",
  "branch": "agent/changelog-v1.3.0",
  "files": [ { "path": "CHANGELOG.md", "content": "..." } ],
  "expires_at": "2026-01-18T12:00:00Z",
  "signed_by": "user@example.com",
  "signature": "BASE64_SIGNATURE"
}

Safety patterns and hardening

These are practical rules you must include for any real-world prototype moving to production:

Manifest-based least privilege: Always require explicit capability declarations (as above).
Read-only by default: Keep all agent endpoints read-only until the user approves write operations.
Signed approvals: Require UI-signed requests with expiry timestamps for any action.
Sandboxing: Run adapters in ephemeral containers with mounted whitelisted paths only — practice sandboxing and isolation.
Secrets handling: Never expose raw secrets to the LLM. Use a secrets manager and temporary scoped tokens for actions like pushing or opening PRs.
Rate and cost controls: throttle LLM calls (agent manifest rate_limits) and batch non-urgent operations.
Audit logs: write tamper-evident logs locally and centralize them to SIEM for teams — follow the zero-trust storage playbook for provenance and governance.
Explicit deny-lists: disallow patterns like ~/.ssh, /etc, or other sensitive locations.

Testing and validation

Before running across real repos, create synthetic repositories with predictable commit history and run unit tests for each adapter.

Unit test: git-checker returns expected branch list for cloned minimal repo.
Integration test: changelog-generator produces consistent headings for commits with focuses "feat", "fix", "docs".
Security test: verify the agent cannot write to disallowed paths and that signed approvals are required for state changes.

Production considerations for teams

If you plan to adopt this pattern in a team or enterprise, treat the prototype as a controlled integration project. Key topics you will need to address:

Governance: enforced manifests and organization-wide policies (SaaS or self-hosted policy engine).
Secrets lifecycle: short-lived tokens, OIDC for remote push/pull, Vault integration.
Monitoring: metrics for API calls, LLM usage, and human approvals. Hook into Prometheus/Grafana.
Compliance: data residency for model inputs (avoid sending sensitive repo content to third-party LLMs unless reviewed).
On-device models: in 2026, many orgs opt to run local LLMs to keep repo content in-house — consider containerized, local-first runtimes and sync appliances.

Advanced strategies (2026-forward)

Make your agent more robust and cost-effective with these techniques:

Plan summaries instead of chain-of-thought: use short structured plans to reduce token usage and avoid leaking internal reasoning.
Tool-use stubs: maintain strict JSON schemas for tool inputs and outputs to let the LLM reason in structured form.
Local embeddings: store repository context as local embeddings to let the agent fetch only relevant context rather than sending entire files.
Batching: aggregate multiple changelog requests into one call to the LLM during low-cost windows.
Model selection: use lightweight local models for routine formatting and reserve powerful models for complex summarization.

Small case study: from prototype to team rollout

One engineering team in late 2025 prototyped an agent that scanned feature branches and proposed stale-branch cleanup. They followed the manifest pattern and required approvals for deletion. Results after 3 months:

70% fewer stale branches older than 90 days.
PR churn reduced because the agent opened changelogs as drafts rather than pushing directly.
Security incidents: zero, because all pushes required signed approval and tokens rotated daily.

Lesson: autonomy does not mean autonomy from humans. Agents scale productivity when paired with clear governance and human-in-the-loop controls.

Common pitfalls and how to avoid them

Overtrusting LLM outputs: never execute commands returned by the model. Use adapters with strict schemas.
Leaking secrets: sanitize any data sent to external models. Prefer local LLM inference for sensitive repos.
Ambiguous approvals: show diffs and exact actions in the approval prompt; make approvals time-limited and scoped.
No audit trail: always record signed audit logs for traceability and incident response — follow the zero-trust storage playbook approach.

Actionable checklist to prototype in a day

Scaffold the repo and manifest (30 mins).
Implement git-checker and changelog-generator adapters (2–3 hours).
Wire a local API and a simple Electron UI that can show diffs and sign approvals (2–3 hours).
Integrate an LLM call (pick a low-cost model for drafts) and test with synthetic repos (1–2 hours).
Run security tests: path whitelist, signed approvals, and sandboxed execution (1–2 hours).

Future predictions (2026 and beyond)

Expect three major shifts over the next 18 months:

OS-level agent permissions: operating systems will standardize permission prompts for agentic apps, similar to location/camera permissions today.
Standardized agent manifests: the industry will converge on declarative manifests (like the one used above) so security tooling can validate capabilities before installation.
Hybrid on-device models: teams will run compact models locally and reserve powerful cloud models only when needed, reducing data exposure and cost.

Key takeaways

Separation of concerns: keep LLM reasoning and side-effecting tools separate and auditable.
Least privilege: use a manifest to limit agent capabilities and require explicit approvals for dangerous actions.
Local-first and auditable: bind to loopback, use ephemeral tokens, and keep logs immutable.
Test and iterate: start with read-only features (like changelog drafts) and add step-by-step write approvals.

Next steps & call-to-action

Ready to prototype your own autonomous desktop agent? Clone our starter repo (contains the manifest, example adapters, and a minimal Electron UI) and run the checklist above in a sandboxed project. If you're building for a team, schedule a short security review to align on manifests, token lifetimes, and sandboxing policies.

Try it now: download the starter template, run the synthetic repo tests, and share results with your team. For deeper help — architecture reviews, security hardening, or production rollouts — contact newservice.cloud for a workshop that converts prototypes to safe production agents.

newservice

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.