{"slug": "show-hn-desktopmcp-mcp-server-for-the-linux-desktop", "title": "Show HN: Desktopmcp – MCP server for the Linux desktop", "summary": "Desktopmcp, a new MCP server for the Linux desktop, gives AI models access to the desktop through 144 tools via XDG Desktop Portals, AT-SPI, and D-Bus, enabling visual and semantic interaction with the UI. The open-source tool runs on Wayland compositors and requires user permission for sensitive operations.", "body_md": "MCP server for the Linux desktop. It gives AI models access to the Linux desktop.\n\ndesktopmcp connects AI assistants to the Linux desktop through three system interfaces: XDG Desktop Portals for sandboxed desktop operations, AT-SPI for semantic UI understanding, and D-Bus for low-level system access. It exposes 144 tools over the Model Context Protocol, letting an AI see what's on screen, understand UI structure, move cursor, click buttons, type text, manage files, and interact with desktop services.\n\nAI models are blind to the desktop. They can't see a window, read a button label, click a menu item, or control the desktop.\n\ndesktopmcp solves this by giving AI models two complementary ways to interact with the desktop:\n\n**Visual**: screenshots, screen capture, color picking, mouse and keyboard input** Semantic**: the full accessibility tree (AT-SPI), where every UI element has a name, role, position, and set of actions the AI can invoke directly\n\nThe semantic path means an AI can call `find_element(role=\"push button\", name=\"Save\")`\n\ninstead of scanning pixels to find where \"Save\" might be. It can read text content, check which element is focused, traverse the UI tree, and perform actions like \"click\" on any element -- all without needing a screenshot.\n\nEverything goes through XDG Desktop Portals, so the user gets permission dialogs, and a single binary runs on GNOME, KDE, Sway, or any Wayland compositor.\n\n**144 tools** organized into five groups:\n\n| Category | Tools | What they do |\n|---|---|---|\nRemote Desktop & Input |\n13 | Start sessions, take screenshots, move/click mouse, type text, touch events, scroll |\nXDG Portals |\n35 | Notifications, file dialogs, clipboard, wallpaper, network status, location, email, printing, camera, settings, global shortcuts, secrets, power profile, game mode, and more |\nDynamic Launcher |\n8 | Install, launch, inspect, and remove `.desktop` application launchers |\nAT-SPI Accessibility |\n76 | Full accessibility tree: find elements, read text, click buttons, get/set values, table and hyperlink inspection, document metadata, event subscriptions, collection search |\nD-Bus Bridge |\n12 | Call any D-Bus method, read/write properties, introspect services, subscribe to signals |\n\n```\nAI model (Claude, etc.)\n    |\n    |  MCP protocol (JSON-RPC over stdio or HTTP)\n    v\ndesktopmcp               (Rust, async, single binary)\n    |\n    |--- XDG Desktop Portals  (ashpd)  --> screenshots, input, files, settings, ...\n    |--- AT-SPI bus            (zbus)   --> UI tree, element actions, text, events\n    |--- D-Bus session/system  (zbus)   --> arbitrary service access\n    |--- PipeWire              (pipewire-rs) --> screen capture frames\n    v\nLinux desktop  (GNOME / KDE / Sway / ...)\n```\n\nXDG Portals ensure every sensitive operation (screen capture, input injection, file access) goes through a user-facing permission dialog. The AI cannot act without the user granting consent.\n\n- Linux with a Wayland compositor (GNOME 40+, KDE Plasma 5.27+, Sway, or similar)\n`xdg-desktop-portal`\n\nand a desktop-specific backend (`xdg-desktop-portal-gnome`\n\n,`-kde`\n\n,`-wlr`\n\n)- PipeWire (for screen capture)\n- AT-SPI2 (\n`at-spi2-core`\n\n, typically pre-installed on any desktop with accessibility support) - D-Bus session bus (present on all modern Linux desktops)\n\nBuild requirements:\n\n- Rust 1.85+ (edition 2024)\n- System libraries:\n`libdbus`\n\n,`libpipewire-0.3`\n\n,`at-spi2-core`\n\n,`pkg-config`\n\n- LLVM/Clang (for pipewire-sys bindgen)\n\nRun directly without cloning:\n\n```\nnix run github:varbhat/desktopmcp -- --t http # or stdio\n```\n\nOr clone and build locally. The repository includes a `flake.nix`\n\nthat provides all system dependencies automatically:\n\n```\ngit clone <repo-url>\ncd desktopmcp\nnix develop\ncargo build --release\n```\n\nWe use Github Actions to build and release AppImages automatically. Grab the latest AppImage from the [releases page](https://github.com/varbhat/desktopmcp/releases). Mark it as executable and run it. It's built using [nix-appimage](https://github.com/ralismark/nix-appimage) which bundles the nix derivation and all its dependencies into a single-file executable, and hence the size of the AppImage is very big (It's something we'll optimize in upcoming releases—but hey, it lets you try it!).\n\nInstall system dependencies, then build:\n\n```\n# Debian / Ubuntu\nsudo apt install libdbus-1-dev libpipewire-0.3-dev at-spi2-core \\\n                 pkg-config libclang-dev\n\n# Fedora\nsudo dnf install dbus-devel pipewire-devel at-spi2-core-devel \\\n                 pkg-config clang-devel\n\n# Arch\nsudo pacman -S dbus pipewire at-spi2-core pkgconf clang\n\n# Build\ncargo build --release\n```\n\nThe binary is at `target/release/desktopmcp`\n\n.\n\n```\n# stdio mode (default; Your MCP Client will do this for you)\ndesktopmcp\n\n# HTTP mode -- for remote-mcp (You need to run this beforehand)\ndesktopmcp --transport http --bind 127.0.0.1:3000\n```\n\nIn stdio mode, MCP messages are exchanged over stdin/stdout (JSON-RPC). Logs go to stderr. In HTTP mode, the server listens at `http://<bind>/mcp`\n\nusing Streamable HTTP with Server-Sent Events. You can configure your MCP Client to use desktopmcp in stdio mode by specifying the binary path. Or you can run the desktopmcp in HTTP remote-mcp mode and configure your MCP Client to use it by specifying the URL.\n\n**Ask AI**: \"What windows are open on my desktop?\"\n\nThe AI calls `get_window_list`\n\nand gets back a structured list of every window with title, application name, position, and size -- no screenshot needed.\n\n**Ask AI**: \"Click the Save button in Firefox\"\n\n``` php\n1. find_element(role=\"push button\", name=\"Save\", app_name=\"Firefox\")\n   -> returns element ID with position and available actions\n2. atspi_do_action(id=\"...\", action_name=\"click\")\n   -> button is clicked via the accessibility API\n```\n\n**Ask AI**: \"Take a screenshot and tell me what you see\"\n\n``` php\n1. start_session(devices=[\"pointer\"], with_screencast=true)\n   -> user approves the permission dialog once\n2. take_screenshot(session_id=\"...\", format=\"jpeg\")\n   -> AI receives a base64-encoded image\n```\n\n**Ask AI**: \"Type my email address into the login form\"\n\n``` php\n1. find_element(role=\"entry\", name=\"Email\")\n   -> finds the text field\n2. type_into(id=\"...\", text=\"user@example.com\")\n   -> text is entered via the EditableText accessibility interface\n```\n\n**Ask AI**: \"Send me a notification when the download finishes\"\n\n```\n1. send_notification(id=\"dl-done\", title=\"Download Complete\", body=\"file.zip is ready\")\n   -> desktop notification appears\n```\n\nThese tools require an active session created with `start_session`\n\n. The user sees a one-time permission dialog.\n\n| Tool | Description |\n|---|---|\n`start_session` |\nCreate a remote desktop session (optionally with screencast and clipboard) |\n`stop_session` |\nEnd a session and release resources |\n`simple_screenshot` |\nOne-shot screenshot via the Screenshot portal (no session needed) |\n`take_screenshot` |\nCapture a frame from an active screencast session (PNG or JPEG) |\n`pick_color` |\nPick a color from anywhere on screen |\n`mouse_move` |\nMove the mouse by a relative offset |\n`mouse_move_absolute` |\nMove the mouse to an absolute screen position |\n`mouse_click` |\nClick a mouse button (left, right, middle) |\n`mouse_scroll` |\nScroll the mouse wheel (smooth) |\n`mouse_scroll_discrete` |\nScroll the mouse wheel (discrete click-by-click steps) |\n`keyboard_key` |\nPress, release, or tap a key by keycode |\n`keyboard_type` |\nType a text string as a sequence of key taps |\n`touch_event` |\nSend touch events (down, motion, up) for touchscreen simulation |\n\nThese tools use sandboxed XDG Desktop Portal APIs. Most work without a session.\n\n| Tool | Description |\n|---|---|\n`send_notification` |\nSend a desktop notification |\n`open_uri` |\nOpen a URI in the default application |\n`open_file` |\nOpen a local file in its default application |\n`open_directory` |\nOpen a directory in the file manager |\n`scheme_supported` |\nCheck if a URI scheme (https, ftp, etc.) is handled |\n`open_file_dialog` |\nShow an open-file dialog |\n`save_file_dialog` |\nShow a save-file dialog |\n`save_files_dialog` |\nShow a save-multiple-files dialog |\n`clipboard_read` |\nRead text from the clipboard (requires session) |\n`clipboard_write` |\nWrite text to the clipboard (requires session) |\n`get_appearance` |\nGet color scheme and accent color |\n`read_setting` |\nRead a specific XDG setting by namespace and key |\n`read_all_settings` |\nRead all settings in one or more namespaces |\n`network_status` |\nCheck network connectivity, metered status |\n`can_reach` |\nTest reachability of a hostname:port |\n`set_wallpaper` |\nSet wallpaper from a URI |\n`set_wallpaper_file` |\nSet wallpaper from a local file |\n`trash_file` |\nMove a file to the system trash |\n`get_user_information` |\nGet current user's ID, name, avatar |\n`request_background` |\nRequest permission to run in the background |\n`set_background_status` |\nSet a background status message |\n`check_camera` |\nCheck if a camera is available |\n`compose_email` |\nOpen the default email client with a pre-composed message |\n`get_location` |\nGet geographic coordinates (requires user permission) |\n`get_memory_warning` |\nPoll for low-memory warnings |\n`get_power_profile` |\nCheck if power-saver mode is active |\n`get_proxy` |\nResolve proxy settings for a URI |\n`retrieve_secret` |\nGet the app secret from the system keyring |\n`game_mode_status` |\nCheck GameMode status |\n`list_shortcuts` |\nList registered global keyboard shortcuts |\n`bind_shortcuts` |\nRegister global keyboard shortcuts |\n`print_file` |\nPrint a file via the system print dialog |\n`session_inhibit` |\nPrevent logout, suspend, or idle |\n`get_available_device_types` |\nQuery available input device types |\n`get_screencast_capabilities` |\nQuery available cursor modes and source types |\n\nInstall and manage `.desktop`\n\napplication launchers.\n\n| Tool | Description |\n|---|---|\n`launcher_supported_types` |\nQuery supported launcher types (application, web app) |\n`launcher_prepare_install` |\nShow install dialog for user confirmation (returns a token) |\n`launcher_install` |\nInstall a `.desktop` launcher using a token |\n`launcher_request_token` |\nGet an install token without a dialog |\n`launcher_uninstall` |\nRemove an installed launcher |\n`launcher_launch` |\nLaunch an app by its `.desktop` file ID |\n`launcher_get_desktop_entry` |\nRead the `.desktop` file content |\n`launcher_get_icon` |\nGet the launcher icon as base64 |\n\nSemantic access to every UI element on the desktop. No screenshot or coordinate guessing required.\n\n**Tree traversal & element inspection:**\n\n| Tool | Description |\n|---|---|\n`atspi_get_desktop` |\nGet the accessibility tree root and list all applications |\n`atspi_get_applications` |\nList running applications with toolkit info |\n`atspi_get_element` |\nGet full properties of an element by ID |\n`atspi_get_children` |\nGet child elements |\n`atspi_get_child_at_index` |\nGet a specific child by index |\n`atspi_get_parent` |\nGet the parent element |\n`atspi_get_properties` |\nGet name, role, states, interfaces, position, size |\n`atspi_get_attributes` |\nGet key-value attributes (CSS properties, etc.) |\n`atspi_get_relation_set` |\nGet relations (labelled-by, controlled-by, etc.) |\n`atspi_get_extended_properties` |\nGet Locale, AccessibleId, HelpText |\n`atspi_get_application_info` |\nGet toolkit name/version, AT-SPI version |\n\n**Search & UI tree:**\n\n| Tool | Description |\n|---|---|\n`find_element` |\nSearch by role, name, and/or application (high-level) |\n`find_focused` |\nGet the currently focused element |\n`get_ui_tree` |\nBuild a hierarchical UI tree to a given depth |\n`get_window_list` |\nList open windows with titles, positions, sizes |\n`wait_for_element` |\nPoll until an element appears or timeout |\n`refresh_ui_cache` |\nRefresh the application list |\n`atspi_collection_get_matches` |\nFast server-side search by role, interfaces, attributes |\n`atspi_collection_get_matches_to` |\nBackwards search from a given element |\n`atspi_get_active_descendant` |\nGet the focused item in a container |\n\n**Actions & interaction:**\n\n| Tool | Description |\n|---|---|\n`click_element` |\nFind an element and perform a click action (high-level) |\n`type_into` |\nFind a text field and type into it (high-level) |\n`atspi_get_actions` |\nList available actions on an element |\n`atspi_do_action` |\nPerform an action by name or index |\n`atspi_get_action_details` |\nGet name, description, key binding for one action |\n`atspi_grab_focus` |\nMove keyboard focus to an element |\n\n**Text:**\n\n| Tool | Description |\n|---|---|\n`read_element_text` |\nFind an element and read its text (high-level) |\n`atspi_get_text` |\nRead text content, character count, caret offset |\n`atspi_set_text` |\nReplace text content |\n`atspi_edit_text` |\nCut, copy, paste, insert, delete operations |\n`atspi_set_caret_offset` |\nMove the text cursor |\n`atspi_get_text_at_offset` |\nGet word/sentence/line at a character offset |\n`atspi_get_string_at_offset` |\nGet text segment by granularity |\n`atspi_get_character_extents` |\nGet screen bounding box of a character |\n`atspi_get_offset_at_point` |\nGet character offset at screen coordinates |\n`atspi_get_text_selections` |\nGet active text selection ranges |\n`atspi_set_text_selection` |\nAdd, modify, or remove text selections |\n`atspi_get_text_attributes` |\nGet font, size, style at an offset |\n`atspi_get_text_attribute_value` |\nGet a single named text attribute |\n`atspi_get_attribute_run` |\nGet the uniform attribute run at an offset |\n`atspi_get_default_attribute_set` |\nGet default attributes for the whole text |\n`atspi_get_range_extents` |\nGet bounding box of a text range |\n`atspi_get_bounded_ranges` |\nFind text ranges within a screen rectangle |\n`atspi_scroll_substring_to` |\nScroll a text range into view |\n`atspi_scroll_substring_to_point` |\nScroll a text range to a specific screen point |\n\n**Component (geometry, scroll, focus):**\n\n| Tool | Description |\n|---|---|\n`atspi_get_position` |\nGet screen position and size of an element |\n`atspi_get_size` |\nGet width and height |\n`atspi_get_layer` |\nGet rendering layer and alpha transparency |\n`atspi_contains` |\nHit-test: check if a point is inside an element |\n`atspi_get_accessible_at_point` |\nFind the element at screen coordinates |\n`atspi_scroll_to` |\nScroll an element into the viewport |\n`atspi_scroll_to_point` |\nScroll to a specific point |\n`atspi_set_geometry` |\nMove or resize an element (position, size, or extents) |\n\n**Value, Selection, Table, Hyperlinks, Document, Image:**\n\n| Tool | Description |\n|---|---|\n`atspi_get_value` / `atspi_set_value` |\nRead/write numeric widget values (sliders, spinners) |\n`atspi_get_selection` / `atspi_select_item` / `atspi_deselect_item` |\nManage selections in lists and combo boxes |\n`atspi_select_all` / `atspi_clear_selection` / `atspi_is_child_selected` |\nBulk selection operations |\n`atspi_get_table_info` / `atspi_get_table_cell` |\nRead table dimensions, cell content, row/column spans |\n`atspi_table_select_row` / `atspi_table_select_column` |\nSelect table rows/columns |\n`atspi_get_table_selection_counts` |\nGet count of selected rows/columns |\n`atspi_get_hyperlinks` |\nList hyperlinks with URIs and character ranges |\n`atspi_get_document_info` |\nGet document locale, attributes (URL, MIME type), page count |\n`atspi_get_document_text_selections` / `atspi_set_document_text_selections` |\nCross-element document selections |\n`atspi_get_image_info` |\nGet image description, locale, position, and size |\n\n**Events:**\n\n| Tool | Description |\n|---|---|\n`atspi_subscribe_events` |\nSubscribe to AT-SPI event categories or specific events |\n`atspi_unsubscribe_events` |\nCancel a subscription |\n`atspi_get_pending_events` |\nPoll buffered events |\n`atspi_get_registered_events` |\nList all currently registered event listeners |\n`atspi_get_status` |\nCheck if AT-SPI is enabled and if a screen reader is active |\n\nDirect access to any D-Bus service on the session or system bus. The AI can introspect and interact with arbitrary desktop services.\n\n| Tool | Description |\n|---|---|\n`dbus_list_names` |\nList all D-Bus service names |\n`dbus_introspect` |\nIntrospect a service's interfaces, methods, signals, properties |\n`dbus_list_objects` |\nWalk the object tree of a service |\n`dbus_call_method` |\nCall any D-Bus method with JSON arguments |\n`dbus_get_property` |\nGet a property value |\n`dbus_get_all_properties` |\nGet all properties of an interface |\n`dbus_set_property` |\nSet a property value |\n`dbus_subscribe_signal` |\nSubscribe to D-Bus signals with a match rule |\n`dbus_unsubscribe_signal` |\nCancel a signal subscription |\n`dbus_get_signals` |\nPoll buffered signals |\n`dbus_list_subscriptions` |\nList active subscriptions |\n`dbus_get_name_owner` |\nResolve a well-known name to a unique bus name |\n\n```\nnix develop              # enter dev shell with all dependencies\ncargo build              # compile\ncargo clippy             # lint\ncargo run                # run in stdio mode\ncargo run -- -t http     # run in HTTP mode\n```\n\nVerify portals are running:\n\n```\nsystemctl --user status xdg-desktop-portal\nsystemctl --user status pipewire\n```\n\nEnable debug logging:\n\n```\nRUST_LOG=debug cargo run\n```\n\nTest with a raw MCP message:\n\n```\necho '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2024-11-05\",\"capabilities\":{},\"clientInfo\":{\"name\":\"test\",\"version\":\"1\"}}}' \\\n  | cargo run --quiet 2>/dev/null\n```\n\n**Permission dialog does not appear**\n\nRestart the portal service:\n\n```\nsystemctl --user restart xdg-desktop-portal\n```\n\n**PipeWire connection fails**\n\nCheck that PipeWire is running:\n\n```\nsystemctl --user status pipewire\nsystemctl --user restart pipewire\n```\n\n**AT-SPI returns no applications**\n\nMake sure the AT-SPI bus is running and accessibility is enabled:\n\n```\n# Check AT-SPI status\ndbus-send --session --print-reply --dest=org.a11y.Bus \\\n  /org/a11y/bus org.freedesktop.DBus.Properties.GetAll \\\n  string:org.a11y.Status\n\n# Enable accessibility (on Gnome) if disabled\ngsettings set org.gnome.desktop.interface toolkit-accessibility true\n```\n\n**Mouse/keyboard input has no effect**\n\n- Call\n`start_session`\n\nfirst with the required devices (`[\"keyboard\", \"pointer\"]`\n\n) - Accept the permission dialog when it appears\n- Verify the session is still active with a valid\n`session_id`\n\ndesktopmcp is designed around the XDG Desktop Portal security model:\n\n- Every sensitive operation requires explicit user consent via a system dialog\n- The server works inside Flatpak and other sandboxed environments\n- D-Bus and AT-SPI access follows standard Linux desktop permissions\n- No root privileges are needed\n- The user can deny or revoke access at any time\n\nThat said, this server gives AI models significant control over your desktop. Use it with trusted models, review what the AI does, and be aware that an unrestricted AI could perform unintended actions.\n\n| Component | Library | Version |\n|---|---|---|\n| MCP protocol | rmcp | 1.7 |\n| XDG Portals | ashpd | 0.13 |\n| D-Bus / AT-SPI | zbus + zvariant | 5 |\n| Screen capture | pipewire-rs | 0.10 |\n| Async runtime | tokio | 1 |\n| Image encoding | image-rs | 0.25 |\n| Serialization | serde + serde_json | 1 |\n| CLI | clap | 4 |", "url": "https://wpnews.pro/news/show-hn-desktopmcp-mcp-server-for-the-linux-desktop", "canonical_source": "https://github.com/varbhat/desktopmcp", "published_at": "2026-06-24 22:59:18+00:00", "updated_at": "2026-06-24 23:13:58.593595+00:00", "lang": "en", "topics": ["ai-tools", "developer-tools", "ai-agents"], "entities": ["desktopmcp", "XDG Desktop Portals", "AT-SPI", "D-Bus", "PipeWire", "GNOME", "KDE", "Sway"], "alternates": {"html": "https://wpnews.pro/news/show-hn-desktopmcp-mcp-server-for-the-linux-desktop", "markdown": "https://wpnews.pro/news/show-hn-desktopmcp-mcp-server-for-the-linux-desktop.md", "text": "https://wpnews.pro/news/show-hn-desktopmcp-mcp-server-for-the-linux-desktop.txt", "jsonld": "https://wpnews.pro/news/show-hn-desktopmcp-mcp-server-for-the-linux-desktop.jsonld"}}