Initial commit: antigravity-claudekit

2026-02-16 14:02:42 +09:00
commit 2d31c0a137
93 changed files with 9518 additions and 0 deletions
--- a/skills/ck-ai-multimodal/SKILL.md
+++ b/skills/ck-ai-multimodal/SKILL.md
@@ -0,0 +1,121 @@
+---
+name: ck-ai-multimodal
+description: >
+  Analyzes images, videos, PDFs, and documents using multimodal AI models.
+  Activate when user says 'analyze this image', 'describe what you see', 'read this PDF',
+  'extract text from screenshot', 'what is in this photo', or 'process this document'.
+  Accepts image files, PDFs, video frames, and URLs to visual content.
+---
+
+## Overview
+Orchestrates multimodal AI analysis on images, documents, and visual content. Extracts structured information, descriptions, OCR text, or domain-specific insights from non-text inputs.
+
+## When to Use
+- Analyzing uploaded images for content, objects, or scene description
+- Extracting text or data from screenshots, PDFs, or scanned documents
+- Comparing multiple images for differences or similarities
+- Processing diagrams, charts, or UI mockups to generate code or descriptions
+- Describing visual content for accessibility or documentation purposes
+- Video frame analysis and summarization
+
+## Don't Use When
+- Input is plain text only (no visual component)
+- User needs to generate new images (use ck-ai-artist)
+- Task is simple file format conversion with no AI analysis needed
+- Document is machine-readable text PDF (use direct text extraction)
+
+## Steps / Instructions
+
+### 1. Identify Input Type and Goal
+Determine:
+- Input format: image (JPEG/PNG/WebP), PDF, video, screenshot
+- Analysis goal: description, OCR, data extraction, comparison, code generation
+- Output format: plain text, JSON, markdown table, code snippet
+
+### 2. Prepare Input
+
+**Images:**
+- Ensure file is accessible (local path or URL)
+- For large images, consider resizing to reduce token cost while preserving detail
+- For PDFs: extract pages as images if needed
+
+**Video:**
+- Extract key frames at regular intervals or scene changes
+- Process frames individually or as a batch
+
+### 3. Craft Analysis Prompt
+
+Be specific about what to extract:
+
+```
+# For structured extraction:
+"Extract all text from this receipt image and return as JSON with fields:
+merchant, date, items (array of {name, price}), total, tax."
+
+# For description:
+"Describe this UI screenshot in detail, including layout, colors,
+components, and any text visible. Focus on structure for a developer."
+
+# For comparison:
+"Compare these two screenshots. List all visible differences
+in UI layout, text, and styling."
+
+# For diagram-to-code:
+"This is a flowchart. Convert it to a Mermaid diagram."
+```
+
+### 4. Call Multimodal Model
+
+Using Google Gemini (via venv Python):
+```python
+# Use: ~/.claude/skills/.venv/bin/python3
+import google.generativeai as genai
+import os, base64, pathlib
+
+genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
+model = genai.GenerativeModel('gemini-1.5-pro')
+
+image_data = pathlib.Path('input.png').read_bytes()
+image_part = {'mime_type': 'image/png', 'data': base64.b64encode(image_data).decode()}
+
+response = model.generate_content([image_part, 'Describe this image in detail.'])
+print(response.text)
+```
+
+Using OpenAI Vision:
+```python
+import openai, base64, os
+
+client = openai.OpenAI(api_key=os.environ['OPENAI_API_KEY'])
+with open('input.png', 'rb') as f:
+    b64 = base64.b64encode(f.read()).decode()
+
+response = client.chat.completions.create(
+    model='gpt-4o',
+    messages=[{
+        'role': 'user',
+        'content': [
+            {'type': 'image_url', 'image_url': {'url': f'data:image/png;base64,{b64}'}},
+            {'type': 'text', 'text': 'Describe this image.'}
+        ]
+    }]
+)
+print(response.choices[0].message.content)
+```
+
+### 5. Post-Process Output
+- Parse JSON if structured extraction was requested
+- Validate extracted data against expected schema
+- For OCR results, clean whitespace and correct obvious errors
+- For code generation from diagrams, run syntax check
+
+### 6. Handle Errors
+- If model returns incomplete extraction, retry with more specific prompt
+- For large PDFs, process in page chunks
+- If image quality is poor, note limitations in output
+
+## Notes
+- Never hardcode API keys; use environment variables
+- Gemini 1.5 Pro handles larger context and longer documents
+- GPT-4o excels at UI/code understanding
+- Always state confidence level when extracting critical data (e.g., financial figures)