cd /news/ai-tools/show-hn-desktopmcp-mcp-server-for-th… · home topics ai-tools article
[ARTICLE · art-38539] src=github.com ↗ pub= topic=ai-tools verified=true sentiment=↑ positive

Show HN: Desktopmcp – MCP server for the Linux desktop

Desktopmcp, a new MCP server for the Linux desktop, gives AI models access to the desktop through 144 tools via XDG Desktop Portals, AT-SPI, and D-Bus, enabling visual and semantic interaction with the UI. The open-source tool runs on Wayland compositors and requires user permission for sensitive operations.

read14 min views1 publishedJun 24, 2026
Show HN: Desktopmcp – MCP server for the Linux desktop
Image: source

MCP server for the Linux desktop. It gives AI models access to the Linux desktop.

desktopmcp connects AI assistants to the Linux desktop through three system interfaces: XDG Desktop Portals for sandboxed desktop operations, AT-SPI for semantic UI understanding, and D-Bus for low-level system access. It exposes 144 tools over the Model Context Protocol, letting an AI see what's on screen, understand UI structure, move cursor, click buttons, type text, manage files, and interact with desktop services.

AI models are blind to the desktop. They can't see a window, read a button label, click a menu item, or control the desktop.

desktopmcp solves this by giving AI models two complementary ways to interact with the desktop:

Visual: screenshots, screen capture, color picking, mouse and keyboard input** Semantic**: the full accessibility tree (AT-SPI), where every UI element has a name, role, position, and set of actions the AI can invoke directly

The semantic path means an AI can call find_element(role="push button", name="Save")

instead of scanning pixels to find where "Save" might be. It can read text content, check which element is focused, traverse the UI tree, and perform actions like "click" on any element -- all without needing a screenshot.

Everything goes through XDG Desktop Portals, so the user gets permission dialogs, and a single binary runs on GNOME, KDE, Sway, or any Wayland compositor.

144 tools organized into five groups:

Category Tools What they do
Remote Desktop & Input
13 Start sessions, take screenshots, move/click mouse, type text, touch events, scroll
XDG Portals
35 Notifications, file dialogs, clipboard, wallpaper, network status, location, email, printing, camera, settings, global shortcuts, secrets, power profile, game mode, and more
Dynamic Launcher
8 Install, launch, inspect, and remove .desktop application launchers
AT-SPI Accessibility
76 Full accessibility tree: find elements, read text, click buttons, get/set values, table and hyperlink inspection, document metadata, event subscriptions, collection search
D-Bus Bridge
12 Call any D-Bus method, read/write properties, introspect services, subscribe to signals
AI model (Claude, etc.)
    |
    |  MCP protocol (JSON-RPC over stdio or HTTP)
    v
desktopmcp               (Rust, async, single binary)
    |
    |--- XDG Desktop Portals  (ashpd)  --> screenshots, input, files, settings, ...
    |--- AT-SPI bus            (zbus)   --> UI tree, element actions, text, events
    |--- D-Bus session/system  (zbus)   --> arbitrary service access
    |--- PipeWire              (pipewire-rs) --> screen capture frames
    v
Linux desktop  (GNOME / KDE / Sway / ...)

XDG Portals ensure every sensitive operation (screen capture, input injection, file access) goes through a user-facing permission dialog. The AI cannot act without the user granting consent.

  • Linux with a Wayland compositor (GNOME 40+, KDE Plasma 5.27+, Sway, or similar) xdg-desktop-portal

and a desktop-specific backend (xdg-desktop-portal-gnome

,-kde

,-wlr

)- PipeWire (for screen capture)

  • AT-SPI2 ( at-spi2-core

, typically pre-installed on any desktop with accessibility support) - D-Bus session bus (present on all modern Linux desktops)

Build requirements:

  • Rust 1.85+ (edition 2024)
  • System libraries: libdbus

,libpipewire-0.3

,at-spi2-core

,pkg-config

  • LLVM/Clang (for pipewire-sys bindgen)

Run directly without cloning:

nix run github:varbhat/desktopmcp -- --t http # or stdio

Or clone and build locally. The repository includes a flake.nix

that provides all system dependencies automatically:

git clone <repo-url>
cd desktopmcp
nix develop
cargo build --release

We use Github Actions to build and release AppImages automatically. Grab the latest AppImage from the releases page. Mark it as executable and run it. It's built using nix-appimage which bundles the nix derivation and all its dependencies into a single-file executable, and hence the size of the AppImage is very big (It's something we'll optimize in upcoming releases—but hey, it lets you try it!).

Install system dependencies, then build:

sudo apt install libdbus-1-dev libpipewire-0.3-dev at-spi2-core \
                 pkg-config libclang-dev

sudo dnf install dbus-devel pipewire-devel at-spi2-core-devel \
                 pkg-config clang-devel

sudo pacman -S dbus pipewire at-spi2-core pkgconf clang

cargo build --release

The binary is at target/release/desktopmcp

.

desktopmcp

desktopmcp --transport http --bind 127.0.0.1:3000

In stdio mode, MCP messages are exchanged over stdin/stdout (JSON-RPC). Logs go to stderr. In HTTP mode, the server listens at http://<bind>/mcp

using Streamable HTTP with Server-Sent Events. You can configure your MCP Client to use desktopmcp in stdio mode by specifying the binary path. Or you can run the desktopmcp in HTTP remote-mcp mode and configure your MCP Client to use it by specifying the URL.

Ask AI: "What windows are open on my desktop?"

The AI calls get_window_list

and gets back a structured list of every window with title, application name, position, and size -- no screenshot needed.

Ask AI: "Click the Save button in Firefox"

1. find_element(role="push button", name="Save", app_name="Firefox")
   -> returns element ID with position and available actions
2. atspi_do_action(id="...", action_name="click")
   -> button is clicked via the accessibility API

Ask AI: "Take a screenshot and tell me what you see"

1. start_session(devices=["pointer"], with_screencast=true)
   -> user approves the permission dialog once
2. take_screenshot(session_id="...", format="jpeg")
   -> AI receives a base64-encoded image

Ask AI: "Type my email address into the login form"

1. find_element(role="entry", name="Email")
   -> finds the text field
2. type_into(id="...", text="user@example.com")
   -> text is entered via the EditableText accessibility interface

Ask AI: "Send me a notification when the download finishes"

1. send_notification(id="dl-done", title="Download Complete", body="file.zip is ready")
   -> desktop notification appears

These tools require an active session created with start_session

. The user sees a one-time permission dialog.

Tool Description
start_session
Create a remote desktop session (optionally with screencast and clipboard)
stop_session
End a session and release resources
simple_screenshot
One-shot screenshot via the Screenshot portal (no session needed)
take_screenshot
Capture a frame from an active screencast session (PNG or JPEG)
pick_color
Pick a color from anywhere on screen
mouse_move
Move the mouse by a relative offset
mouse_move_absolute
Move the mouse to an absolute screen position
mouse_click
Click a mouse button (left, right, middle)
mouse_scroll
Scroll the mouse wheel (smooth)
mouse_scroll_discrete
Scroll the mouse wheel (discrete click-by-click steps)
keyboard_key
Press, release, or tap a key by keycode
keyboard_type
Type a text string as a sequence of key taps
touch_event
Send touch events (down, motion, up) for touchscreen simulation

These tools use sandboxed XDG Desktop Portal APIs. Most work without a session.

Tool Description
send_notification
Send a desktop notification
open_uri
Open a URI in the default application
open_file
Open a local file in its default application
open_directory
Open a directory in the file manager
scheme_supported
Check if a URI scheme (https, ftp, etc.) is handled
open_file_dialog
Show an open-file dialog
save_file_dialog
Show a save-file dialog
save_files_dialog
Show a save-multiple-files dialog
clipboard_read
Read text from the clipboard (requires session)
clipboard_write
Write text to the clipboard (requires session)
get_appearance
Get color scheme and accent color
read_setting
Read a specific XDG setting by namespace and key
read_all_settings
Read all settings in one or more namespaces
network_status
Check network connectivity, metered status
can_reach
Test reachability of a hostname:port
set_wallpaper
Set wallpaper from a URI
set_wallpaper_file
Set wallpaper from a local file
trash_file
Move a file to the system trash
get_user_information
Get current user's ID, name, avatar
request_background
Request permission to run in the background
set_background_status
Set a background status message
check_camera
Check if a camera is available
compose_email
Open the default email client with a pre-composed message
get_location
Get geographic coordinates (requires user permission)
get_memory_warning
Poll for low-memory warnings
get_power_profile
Check if power-saver mode is active
get_proxy
Resolve proxy settings for a URI
retrieve_secret
Get the app secret from the system keyring
game_mode_status
Check GameMode status
list_shortcuts
List registered global keyboard shortcuts
bind_shortcuts
Register global keyboard shortcuts
print_file
Print a file via the system print dialog
session_inhibit
Prevent logout, suspend, or idle
get_available_device_types
Query available input device types
get_screencast_capabilities
Query available cursor modes and source types

Install and manage .desktop

application launchers.

Tool Description
launcher_supported_types
Query supported launcher types (application, web app)
launcher_prepare_install
Show install dialog for user confirmation (returns a token)
launcher_install
Install a .desktop launcher using a token
launcher_request_token
Get an install token without a dialog
launcher_uninstall
Remove an installed launcher
launcher_launch
Launch an app by its .desktop file ID
launcher_get_desktop_entry
Read the .desktop file content
launcher_get_icon
Get the launcher icon as base64

Semantic access to every UI element on the desktop. No screenshot or coordinate guessing required.

Tree traversal & element inspection:

Tool Description
atspi_get_desktop
Get the accessibility tree root and list all applications
atspi_get_applications
List running applications with toolkit info
atspi_get_element
Get full properties of an element by ID
atspi_get_children
Get child elements
atspi_get_child_at_index
Get a specific child by index
atspi_get_parent
Get the parent element
atspi_get_properties
Get name, role, states, interfaces, position, size
atspi_get_attributes
Get key-value attributes (CSS properties, etc.)
atspi_get_relation_set
Get relations (labelled-by, controlled-by, etc.)
atspi_get_extended_properties
Get Locale, AccessibleId, HelpText
atspi_get_application_info
Get toolkit name/version, AT-SPI version

Search & UI tree:

Tool Description
find_element
Search by role, name, and/or application (high-level)
find_focused
Get the currently focused element
get_ui_tree
Build a hierarchical UI tree to a given depth
get_window_list
List open windows with titles, positions, sizes
wait_for_element
Poll until an element appears or timeout
refresh_ui_cache
Refresh the application list
atspi_collection_get_matches
Fast server-side search by role, interfaces, attributes
atspi_collection_get_matches_to
Backwards search from a given element
atspi_get_active_descendant
Get the focused item in a container

Actions & interaction:

Tool Description
click_element
Find an element and perform a click action (high-level)
type_into
Find a text field and type into it (high-level)
atspi_get_actions
List available actions on an element
atspi_do_action
Perform an action by name or index
atspi_get_action_details
Get name, description, key binding for one action
atspi_grab_focus
Move keyboard focus to an element

Text:

Tool Description
read_element_text
Find an element and read its text (high-level)
atspi_get_text
Read text content, character count, caret offset
atspi_set_text
Replace text content
atspi_edit_text
Cut, copy, paste, insert, delete operations
atspi_set_caret_offset
Move the text cursor
atspi_get_text_at_offset
Get word/sentence/line at a character offset
atspi_get_string_at_offset
Get text segment by granularity
atspi_get_character_extents
Get screen bounding box of a character
atspi_get_offset_at_point
Get character offset at screen coordinates
atspi_get_text_selections
Get active text selection ranges
atspi_set_text_selection
Add, modify, or remove text selections
atspi_get_text_attributes
Get font, size, style at an offset
atspi_get_text_attribute_value
Get a single named text attribute
atspi_get_attribute_run
Get the uniform attribute run at an offset
atspi_get_default_attribute_set
Get default attributes for the whole text
atspi_get_range_extents
Get bounding box of a text range
atspi_get_bounded_ranges
Find text ranges within a screen rectangle
atspi_scroll_substring_to
Scroll a text range into view
atspi_scroll_substring_to_point
Scroll a text range to a specific screen point

Component (geometry, scroll, focus):

Tool Description
atspi_get_position
Get screen position and size of an element
atspi_get_size
Get width and height
atspi_get_layer
Get rendering layer and alpha transparency
atspi_contains
Hit-test: check if a point is inside an element
atspi_get_accessible_at_point
Find the element at screen coordinates
atspi_scroll_to
Scroll an element into the viewport
atspi_scroll_to_point
Scroll to a specific point
atspi_set_geometry
Move or resize an element (position, size, or extents)

Value, Selection, Table, Hyperlinks, Document, Image:

Tool Description
atspi_get_value / atspi_set_value
Read/write numeric widget values (sliders, spinners)
atspi_get_selection / atspi_select_item / atspi_deselect_item
Manage selections in lists and combo boxes
atspi_select_all / atspi_clear_selection / atspi_is_child_selected
Bulk selection operations
atspi_get_table_info / atspi_get_table_cell
Read table dimensions, cell content, row/column spans
atspi_table_select_row / atspi_table_select_column
Select table rows/columns
atspi_get_table_selection_counts
Get count of selected rows/columns
atspi_get_hyperlinks
List hyperlinks with URIs and character ranges
atspi_get_document_info
Get document locale, attributes (URL, MIME type), page count
atspi_get_document_text_selections / atspi_set_document_text_selections
Cross-element document selections
atspi_get_image_info
Get image description, locale, position, and size

Events:

Tool Description
atspi_subscribe_events
Subscribe to AT-SPI event categories or specific events
atspi_unsubscribe_events
Cancel a subscription
atspi_get_pending_events
Poll buffered events
atspi_get_registered_events
List all currently registered event listeners
atspi_get_status
Check if AT-SPI is enabled and if a screen reader is active

Direct access to any D-Bus service on the session or system bus. The AI can introspect and interact with arbitrary desktop services.

Tool Description
dbus_list_names
List all D-Bus service names
dbus_introspect
Introspect a service's interfaces, methods, signals, properties
dbus_list_objects
Walk the object tree of a service
dbus_call_method
Call any D-Bus method with JSON arguments
dbus_get_property
Get a property value
dbus_get_all_properties
Get all properties of an interface
dbus_set_property
Set a property value
dbus_subscribe_signal
Subscribe to D-Bus signals with a match rule
dbus_unsubscribe_signal
Cancel a signal subscription
dbus_get_signals
Poll buffered signals
dbus_list_subscriptions
List active subscriptions
dbus_get_name_owner
Resolve a well-known name to a unique bus name
nix develop              # enter dev shell with all dependencies
cargo build              # compile
cargo clippy             # lint
cargo run                # run in stdio mode
cargo run -- -t http     # run in HTTP mode

Verify portals are running:

systemctl --user status xdg-desktop-portal
systemctl --user status pipewire

Enable debug logging:

RUST_LOG=debug cargo run

Test with a raw MCP message:

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1"}}}' \
  | cargo run --quiet 2>/dev/null

Permission dialog does not appear

Restart the portal service:

systemctl --user restart xdg-desktop-portal

PipeWire connection fails

Check that PipeWire is running:

systemctl --user status pipewire
systemctl --user restart pipewire

AT-SPI returns no applications

Make sure the AT-SPI bus is running and accessibility is enabled:

dbus-send --session --print-reply --dest=org.a11y.Bus \
  /org/a11y/bus org.freedesktop.DBus.Properties.GetAll \
  string:org.a11y.Status

gsettings set org.gnome.desktop.interface toolkit-accessibility true

Mouse/keyboard input has no effect

  • Call start_session

first with the required devices (["keyboard", "pointer"]

) - Accept the permission dialog when it appears

  • Verify the session is still active with a valid session_id

desktopmcp is designed around the XDG Desktop Portal security model:

  • Every sensitive operation requires explicit user consent via a system dialog
  • The server works inside Flatpak and other sandboxed environments
  • D-Bus and AT-SPI access follows standard Linux desktop permissions
  • No root privileges are needed
  • The user can deny or revoke access at any time

That said, this server gives AI models significant control over your desktop. Use it with trusted models, review what the AI does, and be aware that an unrestricted AI could perform unintended actions.

Component Library Version
MCP protocol rmcp 1.7
XDG Portals ashpd 0.13
D-Bus / AT-SPI zbus + zvariant 5
Screen capture pipewire-rs 0.10
Async runtime tokio 1
Image encoding image-rs 0.25
Serialization serde + serde_json 1
CLI clap 4
── more in #ai-tools 4 stories · sorted by recency
── more on @desktopmcp 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-desktopmcp-m…] indexed:0 read:14min 2026-06-24 ·