Overview¶

Note

tarzi supports only Linux and macOS. Windows is not supported.

What is tarzi?¶

Tarzi is a unified search interface designed for Retrieval-Augmented Generation (RAG) and agentic systems built on large language models. Search is a core functionality in these systems, yet most search engine providers (SEPs) impose API paywalls or strict rate limits. Tarzi, empowered by browser automation and web crawling technologies, removes these barriers by supporting token-free queries across multiple search engines. With a single dependency, you can integrate and switch between different SEPs as needed—seamlessly and efficiently.

Key Components¶

Converter Module¶

The Converter module is responsible for transforming raw HTML content into various structured formats:

HTML to Markdown: Clean, readable text format perfect for AI training data
HTML to JSON: Structured data with metadata (title, links, images, content)
HTML to YAML: Human-readable structured format for configuration and data storage

Key features:

Intelligent content extraction
Metadata preservation
Customizable output formatting
Memory-efficient processing

Fetcher Module¶

The Fetcher module handles web page retrieval with multiple strategies:

HTTP Mode

Fast, lightweight HTTP requests for static content

Browser Automation

Full browser automation for JavaScript-heavy sites:

Headless mode for server environments
Headed mode for debugging

Proxy Support

Custom proxy support for all fetch modes

Key features:

Multiple fetch strategies
Automatic retry logic
Custom user agent support
Timeout configuration
Cookie and session management

Search Module¶

The Search module provides comprehensive search engine integration with a unified parser architecture:

Base Parser Architecture

All search engines inherit from base parser traits:

BaseSearchParser: Core trait with name, engine type, and support checking
WebSearchParser: HTML-based parsing with parse_html() method
ApiSearchParser: JSON-based parsing with parse_json() method
UnifiedParser: Combines web and API parsing capabilities

Browser-Based Search

Scrape search results directly from search engine pages:

Google, Bing, DuckDuckGo, Brave Search, Baidu support
Custom search engine configuration
Anti-detection measures

API-Based Search

Direct API integration for supported search engines:

Multiple API Providers: Brave, Google, Exa, Travily, DuckDuckGo (more to come)
Automatic Provider Switching: Smart fallback when primary provider fails
Proxy Support: Full proxy support for all API providers
Structured Results: Consistent result format across all providers

Parser Factory

Factory pattern for creating and managing parsers:

Mode-aware parser selection (WebQuery vs ApiQuery)
Custom parser registration
Automatic fallback for unsupported combinations

Key features:

Multiple search engine support
Configurable result limits
Search result ranking
Snippet extraction
URL validation and cleaning
Extensible parser architecture

Getting Started¶

Ready to get started? Check out our Installation guide and Quick Start Guide tutorial to begin using tarzi in your projects.

For detailed examples and advanced usage patterns, see our Examples section.