diff --git a/home-manager/progs/opencode.nix b/home-manager/progs/opencode.nix index 9dd3d64..1a10479 100644 --- a/home-manager/progs/opencode.nix +++ b/home-manager/progs/opencode.nix @@ -86,6 +86,77 @@ in xdg.configFile."opencode/plugins/opencode-claude-bridge.js".source = "${opencode-claude-bridge}/lib/opencode-claude-bridge/dist/index.js"; + xdg.configFile."opencode/skills/android-ui.md".text = '' + --- + name: android-ui + description: "Android UI automation via ADB - use for any Android device interaction, UI testing, screenshot analysis, element coordinate lookup, and gesture automation." + --- + + # Android UI Interaction Workflow + + ## 1. Taking Screenshots + ``` + adb exec-out screencap -p > /tmp/screen.png + ``` + Captures the current screen state as a PNG image. + + ## 2. Analyzing Screenshots + Delegate screenshot analysis to an explore agent rather than analyzing images directly: + ``` + mcp_task(subagent_type="explore", prompt="Analyze /tmp/screen.png. What screen is this? What elements are visible?") + ``` + The agent describes the UI, identifies elements, and estimates Y coordinates. + + ## 3. Getting Precise Element Coordinates + UI Automator dump - extracts the full UI hierarchy as XML: + ``` + adb shell uiautomator dump /sdcard/ui.xml && adb pull /sdcard/ui.xml /tmp/ui.xml + ``` + Then grep for specific elements: + ```sh + # Find by text + grep -oP 'text="Login".*?bounds="[^"]*"' /tmp/ui.xml + # Find by class + grep -oP 'class="android.widget.EditText".*?bounds="[^"]*"' /tmp/ui.xml + ``` + Bounds format: `[left,top][right,bottom]` — tap center: `((left+right)/2, (top+bottom)/2)` + + ## 4. Tapping Elements + ``` + adb shell input tap X Y + ``` + Where X, Y are pixel coordinates from the bounds. + + ## 5. Text Input + ``` + adb shell input text "some_text" + ``` + Note: Special characters need escaping (`\!`, `\;`, etc.) + + ## 6. Other Gestures + ```sh + # Swipe/scroll + adb shell input swipe startX startY endX endY duration_ms + # Key events + adb shell input keyevent KEYCODE_BACK + adb shell input keyevent KEYCODE_ENTER + ``` + + ## 7. WebView Limitation + - UI Automator can see WebView content if accessibility is enabled + - Touch events on iframe content (like Cloudflare Turnstile) often fail due to cross-origin isolation + - Form fields in WebViews work if you get exact bounds from the UI dump + + ## Typical Flow + 1. Take screenshot → analyze with explore agent (get rough layout) + 2. Dump UI hierarchy → grep for exact element bounds + - NEVER ASSUME COORDINATES. You must ALWAYS check first. + - Do this before ANY tap action as elements on the screen may have changed. + 3. Calculate center coordinates from bounds + 4. Tap/interact + 5. Wait → screenshot → verify result + ''; + xdg.configFile."opencode/skills/playwright.md".text = let browsers = pkgs.playwright-driver.browsers; @@ -140,56 +211,6 @@ in ## Nix For using `nix build` append `-L` to get better visibility into the logs. If you get an error that a file can't be found, always try to `git add` the file before trying other troubleshooting steps. - - - ## Android UI Interaction Workflow Summary - 1. Taking Screenshots - adb exec-out screencap -p > /tmp/screen.png - Captures the current screen state as a PNG image. - - 2. Analyzing Screenshots - I delegate screenshot analysis to an explore agent rather than analyzing images directly: - mcp_task(subagent_type="explore", prompt="Analyze /tmp/screen.png. What screen is this? What elements are visible?") - The agent describes the UI, identifies elements, and estimates Y coordinates. - - 3. Getting Precise Element Coordinates - UI Automator dump - extracts the full UI hierarchy as XML: - adb shell uiautomator dump /sdcard/ui.xml && adb pull /sdcard/ui.xml /tmp/ui.xml - Then grep for specific elements: - # Find by text - grep -oP 'text="Login".*?bounds="[^"]*"' /tmp/ui.xml - # Find by class - grep -oP 'class="android.widget.EditText".*?bounds="[^"]*"' /tmp/ui.xml - Bounds format: [left,top][right,bottom] → tap center: ((left+right)/2, (top+bottom)/2) - - 4. Tapping Elements - adb shell input tap X Y - Where X, Y are pixel coordinates from the bounds. - - 5. Text Input - adb shell input text "some_text" - Note: Special characters need escaping (\!, \;, etc.) - - 6. Other Gestures - # Swipe/scroll - adb shell input swipe startX startY endX endY duration_ms - # Key events - adb shell input keyevent KEYCODE_BACK - adb shell input keyevent KEYCODE_ENTER - - 7. WebView Limitation - - UI Automator can see WebView content if accessibility is enabled - - Touch events on iframe content (like Cloudflare Turnstile) often fail due to cross-origin isolation - - Form fields in WebViews work if you get exact bounds from the UI dump - - Typical Flow - 1. Take screenshot → analyze with explore agent (get rough layout) - 2. Dump UI hierarchy → grep for exact element bounds - - NEVER ASSUME COORDINATES. You must ALWAYS check first. - - Do this before ANY tap action as elements on the screen may of changed. - 3. Calculate center coordinates from bounds - 4. Tap/interact - 5. Wait → screenshot → verify result ''; settings = { theme = "opencode";