opencode: move android stuff to android-ui skill
This commit is contained in:
@@ -86,6 +86,77 @@ in
|
|||||||
xdg.configFile."opencode/plugins/opencode-claude-bridge.js".source =
|
xdg.configFile."opencode/plugins/opencode-claude-bridge.js".source =
|
||||||
"${opencode-claude-bridge}/lib/opencode-claude-bridge/dist/index.js";
|
"${opencode-claude-bridge}/lib/opencode-claude-bridge/dist/index.js";
|
||||||
|
|
||||||
|
xdg.configFile."opencode/skills/android-ui.md".text = ''
|
||||||
|
---
|
||||||
|
name: android-ui
|
||||||
|
description: "Android UI automation via ADB - use for any Android device interaction, UI testing, screenshot analysis, element coordinate lookup, and gesture automation."
|
||||||
|
---
|
||||||
|
|
||||||
|
# Android UI Interaction Workflow
|
||||||
|
|
||||||
|
## 1. Taking Screenshots
|
||||||
|
```
|
||||||
|
adb exec-out screencap -p > /tmp/screen.png
|
||||||
|
```
|
||||||
|
Captures the current screen state as a PNG image.
|
||||||
|
|
||||||
|
## 2. Analyzing Screenshots
|
||||||
|
Delegate screenshot analysis to an explore agent rather than analyzing images directly:
|
||||||
|
```
|
||||||
|
mcp_task(subagent_type="explore", prompt="Analyze /tmp/screen.png. What screen is this? What elements are visible?")
|
||||||
|
```
|
||||||
|
The agent describes the UI, identifies elements, and estimates Y coordinates.
|
||||||
|
|
||||||
|
## 3. Getting Precise Element Coordinates
|
||||||
|
UI Automator dump - extracts the full UI hierarchy as XML:
|
||||||
|
```
|
||||||
|
adb shell uiautomator dump /sdcard/ui.xml && adb pull /sdcard/ui.xml /tmp/ui.xml
|
||||||
|
```
|
||||||
|
Then grep for specific elements:
|
||||||
|
```sh
|
||||||
|
# Find by text
|
||||||
|
grep -oP 'text="Login".*?bounds="[^"]*"' /tmp/ui.xml
|
||||||
|
# Find by class
|
||||||
|
grep -oP 'class="android.widget.EditText".*?bounds="[^"]*"' /tmp/ui.xml
|
||||||
|
```
|
||||||
|
Bounds format: `[left,top][right,bottom]` — tap center: `((left+right)/2, (top+bottom)/2)`
|
||||||
|
|
||||||
|
## 4. Tapping Elements
|
||||||
|
```
|
||||||
|
adb shell input tap X Y
|
||||||
|
```
|
||||||
|
Where X, Y are pixel coordinates from the bounds.
|
||||||
|
|
||||||
|
## 5. Text Input
|
||||||
|
```
|
||||||
|
adb shell input text "some_text"
|
||||||
|
```
|
||||||
|
Note: Special characters need escaping (`\!`, `\;`, etc.)
|
||||||
|
|
||||||
|
## 6. Other Gestures
|
||||||
|
```sh
|
||||||
|
# Swipe/scroll
|
||||||
|
adb shell input swipe startX startY endX endY duration_ms
|
||||||
|
# Key events
|
||||||
|
adb shell input keyevent KEYCODE_BACK
|
||||||
|
adb shell input keyevent KEYCODE_ENTER
|
||||||
|
```
|
||||||
|
|
||||||
|
## 7. WebView Limitation
|
||||||
|
- UI Automator can see WebView content if accessibility is enabled
|
||||||
|
- Touch events on iframe content (like Cloudflare Turnstile) often fail due to cross-origin isolation
|
||||||
|
- Form fields in WebViews work if you get exact bounds from the UI dump
|
||||||
|
|
||||||
|
## Typical Flow
|
||||||
|
1. Take screenshot → analyze with explore agent (get rough layout)
|
||||||
|
2. Dump UI hierarchy → grep for exact element bounds
|
||||||
|
- NEVER ASSUME COORDINATES. You must ALWAYS check first.
|
||||||
|
- Do this before ANY tap action as elements on the screen may have changed.
|
||||||
|
3. Calculate center coordinates from bounds
|
||||||
|
4. Tap/interact
|
||||||
|
5. Wait → screenshot → verify result
|
||||||
|
'';
|
||||||
|
|
||||||
xdg.configFile."opencode/skills/playwright.md".text =
|
xdg.configFile."opencode/skills/playwright.md".text =
|
||||||
let
|
let
|
||||||
browsers = pkgs.playwright-driver.browsers;
|
browsers = pkgs.playwright-driver.browsers;
|
||||||
@@ -140,56 +211,6 @@ in
|
|||||||
## Nix
|
## Nix
|
||||||
For using `nix build` append `-L` to get better visibility into the logs.
|
For using `nix build` append `-L` to get better visibility into the logs.
|
||||||
If you get an error that a file can't be found, always try to `git add` the file before trying other troubleshooting steps.
|
If you get an error that a file can't be found, always try to `git add` the file before trying other troubleshooting steps.
|
||||||
|
|
||||||
|
|
||||||
## Android UI Interaction Workflow Summary
|
|
||||||
1. Taking Screenshots
|
|
||||||
adb exec-out screencap -p > /tmp/screen.png
|
|
||||||
Captures the current screen state as a PNG image.
|
|
||||||
|
|
||||||
2. Analyzing Screenshots
|
|
||||||
I delegate screenshot analysis to an explore agent rather than analyzing images directly:
|
|
||||||
mcp_task(subagent_type="explore", prompt="Analyze /tmp/screen.png. What screen is this? What elements are visible?")
|
|
||||||
The agent describes the UI, identifies elements, and estimates Y coordinates.
|
|
||||||
|
|
||||||
3. Getting Precise Element Coordinates
|
|
||||||
UI Automator dump - extracts the full UI hierarchy as XML:
|
|
||||||
adb shell uiautomator dump /sdcard/ui.xml && adb pull /sdcard/ui.xml /tmp/ui.xml
|
|
||||||
Then grep for specific elements:
|
|
||||||
# Find by text
|
|
||||||
grep -oP 'text="Login".*?bounds="[^"]*"' /tmp/ui.xml
|
|
||||||
# Find by class
|
|
||||||
grep -oP 'class="android.widget.EditText".*?bounds="[^"]*"' /tmp/ui.xml
|
|
||||||
Bounds format: [left,top][right,bottom] → tap center: ((left+right)/2, (top+bottom)/2)
|
|
||||||
|
|
||||||
4. Tapping Elements
|
|
||||||
adb shell input tap X Y
|
|
||||||
Where X, Y are pixel coordinates from the bounds.
|
|
||||||
|
|
||||||
5. Text Input
|
|
||||||
adb shell input text "some_text"
|
|
||||||
Note: Special characters need escaping (\!, \;, etc.)
|
|
||||||
|
|
||||||
6. Other Gestures
|
|
||||||
# Swipe/scroll
|
|
||||||
adb shell input swipe startX startY endX endY duration_ms
|
|
||||||
# Key events
|
|
||||||
adb shell input keyevent KEYCODE_BACK
|
|
||||||
adb shell input keyevent KEYCODE_ENTER
|
|
||||||
|
|
||||||
7. WebView Limitation
|
|
||||||
- UI Automator can see WebView content if accessibility is enabled
|
|
||||||
- Touch events on iframe content (like Cloudflare Turnstile) often fail due to cross-origin isolation
|
|
||||||
- Form fields in WebViews work if you get exact bounds from the UI dump
|
|
||||||
|
|
||||||
Typical Flow
|
|
||||||
1. Take screenshot → analyze with explore agent (get rough layout)
|
|
||||||
2. Dump UI hierarchy → grep for exact element bounds
|
|
||||||
- NEVER ASSUME COORDINATES. You must ALWAYS check first.
|
|
||||||
- Do this before ANY tap action as elements on the screen may of changed.
|
|
||||||
3. Calculate center coordinates from bounds
|
|
||||||
4. Tap/interact
|
|
||||||
5. Wait → screenshot → verify result
|
|
||||||
'';
|
'';
|
||||||
settings = {
|
settings = {
|
||||||
theme = "opencode";
|
theme = "opencode";
|
||||||
|
|||||||
Reference in New Issue
Block a user