Overview¶
Note
tarzi supports only Linux and macOS. Windows is not supported.
What is tarzi?¶
Tarzi is a unified search interface designed for Retrieval-Augmented Generation (RAG) and agentic systems built on large language models. Search is a core functionality in these systems, yet most search engine providers (SEPs) impose API paywalls or strict rate limits. Tarzi, empowered by browser automation and web crawling technologies, removes these barriers by supporting token-free queries across multiple search engines. With a single dependency, you can integrate and switch between different SEPs as needed—seamlessly and efficiently.
Key Components¶
Converter Module¶
The Converter module is responsible for transforming raw HTML content into various structured formats:
HTML to Markdown: Clean, readable text format perfect for AI training data
HTML to JSON: Structured data with metadata (title, links, images, content)
HTML to YAML: Human-readable structured format for configuration and data storage
Key features:
Intelligent content extraction
Metadata preservation
Customizable output formatting
Memory-efficient processing
Fetcher Module¶
The Fetcher module handles web page retrieval with multiple strategies:
- HTTP Mode
Fast, lightweight HTTP requests for static content
- Browser Automation
Full browser automation for JavaScript-heavy sites:
Headless mode for server environments
Headed mode for debugging
- Proxy Support
Custom proxy support for all fetch modes
Key features:
Multiple fetch strategies
Automatic retry logic
Custom user agent support
Timeout configuration
Cookie and session management
Search Module¶
The Search module provides comprehensive search engine integration with a unified parser architecture:
- Base Parser Architecture
All search engines inherit from base parser traits:
BaseSearchParser: Core trait with name, engine type, and support checking
WebSearchParser: HTML-based parsing with parse_html() method
ApiSearchParser: JSON-based parsing with parse_json() method
UnifiedParser: Combines web and API parsing capabilities
- Browser-Based Search
Scrape search results directly from search engine pages:
Google, Bing, DuckDuckGo, Brave Search, Baidu support
Custom search engine configuration
Anti-detection measures
- API-Based Search
Direct API integration for supported search engines:
Multiple API Providers: Brave, Google, Exa, Travily, DuckDuckGo (more to come)
Automatic Provider Switching: Smart fallback when primary provider fails
Proxy Support: Full proxy support for all API providers
Structured Results: Consistent result format across all providers
- Parser Factory
Factory pattern for creating and managing parsers:
Mode-aware parser selection (WebQuery vs ApiQuery)
Custom parser registration
Automatic fallback for unsupported combinations
Key features:
Multiple search engine support
Configurable result limits
Search result ranking
Snippet extraction
URL validation and cleaning
Extensible parser architecture
Getting Started¶
Ready to get started? Check out our Installation guide and Quick Start Guide tutorial to begin using tarzi in your projects.
For detailed examples and advanced usage patterns, see our Examples section.