ENGINEERING
DATA SOVEREIGNTY.
??工? Notion Internal API,實?????遷移?br> ?迴???斷點????件??以??自????Confluence ????
// README.md: THE_WHY
"In the era of AI & Vibe Coding, building tools is faster than ever. But Token costs and Time are real."
?們??找??題?,然後?估是??算???幹?它?
???移 500+ Notion ?面 = 120+ 小? ?腦????每次?本)
?發?蟲?自??管? = 一次?/span> ?發?本 + 1 小? ??)
?就?為什麼??快?用 AI 寫工??式???將人類?????中解???解決更?難??題?/span>
PERFORMANCE_METRICS
> Data Integrity: 100%
> Memory Usage: <150MB (Streaming)
Exponential Efficiency
?於 500+ ?????實?移??測試?人工搬???耗? (?估 9 ??/??,?容???層?結??? Notion Crawler ??並????自??轉?,??週?工?縮短??1 小?????
SYSTEM_ARCHITECTURE
Recursive Crawler Engine
- Reverse Internal API:直?串?? `loadPageChunk` ??結???Block 資?,速度?Playwright ?10 ??/li>
- ?迴?歷 (Recursive):自?解???面 (Sub-pages) ??Database Rows,精確??無?層級?構?/li>
- ?偵測???/b>:實?Exponential Backoff ??Jitter ??延遲,?????429 Rate Limit??/li>
Markdown Transpiler
- AST ??:??? JSON ???抽象?法樹,確? Table, Callout, Code Block 等???件精確渲??/li>
- Knowledge Stitcher:自???並縫??散??API ?數?面 (Input/Output/Schema),?組為?? Truth??/li>
Resilience & Failover
- Dual-Domain Failover:優??? `notion.so`,??自???至 `notion.site` ????,確?99.9% ?用??/li>
- Connection Pooling:使?? `requests.Session` 維? TCP ???池?減? TLS ???銷,??大?爬????/li>
- Granular Checkpoints:SQLite/JSON 記? Page ID ???實現 100% ??續傳??/li>
Confluence Integrator
- BFS Traversal:採?廣度優??????確?????對優?於子??建立??? Orphan Pages??/li>
- Smart Transform:自?? Mermaid ?塊??為 Confluence Macro,並修復?面??????? (Internal Links)??/li>
- Auto-Root Management:自?在 Space ?目?建?`Notion_KB`,支??`--clean` ?迴?除以進?乾淨?部署?/li>
Legacy Mode (Fallback)
- Playwright Renderer:??瀏覽?模???? DOM ?? Breadcrumb 決?路?,解?API ?????特?Edge Case??
- Interactive Crawling:支?? Auto-Scroll 觸發 Lazy Loading?自????Toggle????Database??/li>
- Stealth Mode:使??Headed 模??隨機延??(3-10s) 繞? Cloudflare 驗???/li>
Test Suite (Quality Gate)
- 130 Unit Tests:使??pytest 覆?三大??模??????渲??輯??併策略,確保?次???????行為??/li>
- Zero-IO Pure Testing:RichText 轉??Block 渲???題?歧????輯?為純函式測試??? Mock 外部????/li>
- Filesystem
Isolation:?併?樹建構測試使??pytest
tmp_pathfixture,?????污??實檔?系統??
DEV_EXPERIENCE (DX)
?發????機?快速迭???/span>> Mocking API responses...
> Ready. (0ms latency)
Offline Replay
?發???輯?直???本??Snapshot?b>完全???網,?迭代?度?? 100 ??/p>
> [SKIP] POST /wiki/rest/api/content
> No changes applied.
Dry Run Mode
模擬轉譯????程??輸?日誌而???寫入,確保??? Confluence 資???突?/p>
> Failed: 3 pages (Rate Limited)
> Resuming from last success...
Smart Resume
程????自????Checkpoint,跳?已?? (Success) ??????試失????/p>
CLI_COMMANDS
使用?? API 快速爬?????覽???。適?於大批??????/p>
python crawl_notion_api.py --token $NOTION_TOKEN_V2 --page $ROOT_PAGE_ID
將零???案?併為 API ?件,並?? MkDocs ?地伺???覽?/p>
python build_knowledge_base.py && mkdocs serve
??讀??output ??並??至?? Space?`--source all` ???上傳??/p>
python upload_to_confluence.py --source all --space ENGINEERING
FALLBACK_STRATEGY
Why We Need a Fallback
Notion ?部 API (loadPageChunk) 屬於?公?端點????能變更??????證????? ?爬?被識別?自??工具,API 請?將直????403 ??429??br>
?此??案內?Playwright ?覽?模?/b>作為完整?援??? 以?實瀏覽?渲????完全繞? API 層?確??任何?境??能完?資??移??
API vs Playwright 比?
| API Mode | Playwright | |
|---|---|---|
| ?度 | ~10 min | ~3 hrs |
| ?偵?/td> | Header ?? | ?實?覽??/span> |
| API 依賴 | ?公??API | ??API 依賴 |
| Cloudflare | ?能被???/td> | 完全繞? |
| ??續傳 | ✓ | ✓ |
| 記憶?/td> | <50MB | ~500MB |
TEST_SUITE
pytest ??130 tests across 3 core modulesMODULE COVERAGE
TEST CATEGORIES
pip install pytest && pytest -v --tb=short