tarzi - Rust-native lite search for AI applicationsΒΆ

Crate Version PyPI Version License Build Status

tarzi is a powerful, Rust-native search library designed specifically for AI applications. It provides a comprehensive toolkit for content conversion, web fetching, and search engine integration with both browser-based and API-based approaches.

Key FeaturesΒΆ

πŸ”§ Dual Implementation

Native Rust library with Python bindings and CLI tools

πŸ”„ Content Conversion

Convert raw HTML to Markdown, JSON, or YAML formats

🌐 Web Fetching

Fetch web pages with optional JavaScript rendering support

πŸ” Search Integration

Query search engines using browser mode (headless/headed/existing) or API mode

🎯 Web Search Engines

Support for Bing, Google, DuckDuckGo, Brave Search, and custom engines

πŸš€ API Search Providers

Direct API integration with Brave Search, Exa, Travily, and DuckDuckGo

πŸ”„ Automatic Provider Switching

Smart fallback between API providers for enhanced reliability

πŸ”’ Proxy Support

Use proxies in both browser-based and API-based operations

⚑ End-to-End Pipeline

Complete workflow from search queries to content extraction for AI applications

Quick StartΒΆ

PythonΒΆ

pip install tarzi
import tarzi

# Convert HTML to Markdown
markdown = tarzi.convert_html("<h1>Hello</h1>", "markdown")

# Fetch web page
content = tarzi.fetch_url("https://example.com", js=True)

# Search web (browser-based)
results = tarzi.search_web("python programming", "webquery", 10)

# Search using API providers (requires API keys)
results = tarzi.search_web("machine learning", "apiquery", 10)

RustΒΆ

cargo add tarzi
use tarzi::{Converter, WebFetcher, SearchEngine, Format, FetchMode, SearchMode};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Convert HTML to Markdown
    let converter = Converter::new();
    let markdown = converter.convert("<h1>Hello</h1>", Format::Markdown).await?;

    // Fetch web page
    let mut fetcher = WebFetcher::new();
    let content = fetcher.fetch(
        "https://example.com",
        FetchMode::BrowserHeadless,
        Format::Markdown
    ).await?;

    // Search web (browser-based)
    let mut search_engine = SearchEngine::new();
    let results = search_engine.search(
        "agentic AI",
        SearchMode::WebQuery,
        5
    ).await?;

    // Search using API providers (requires API keys)
    let mut api_search_engine = SearchEngine::from_config(&Config::new());
    let api_results = api_search_engine.search(
        "machine learning",
        SearchMode::ApiQuery,
        5
    ).await?;

    Ok(())
}

CLIΒΆ

# Install the CLI tool
cargo install tarzi

# Convert HTML to Markdown
tarzi convert --input "<h1>Hello</h1>" --format markdown

# Fetch web page with JavaScript rendering
tarzi fetch --url "https://example.com" --mode browser_headless --format json

# Search and fetch content (browser-based)
tarzi search-and-fetch \
  --query "agentic AI" \
  --search-mode webquery \
  --fetch-mode plain_request \
  --format markdown \
  --limit 5

# Search using API providers (requires API keys)
tarzi search-and-fetch \
  --query "machine learning" \
  --search-mode apiquery \
  --fetch-mode plain_request \
  --format markdown \
  --limit 5

Use CasesΒΆ

πŸ€– AI Data Collection

Gather and process web content for training data or knowledge bases

πŸ“Š Research Automation

Automate web research workflows for academic or business intelligence

πŸ” Content Aggregation

Build content aggregation systems that convert web pages to structured data

πŸ•·οΈ Web Scraping Pipelines

Create robust web scraping pipelines with built-in retry logic and format conversion

πŸ”„ API Development

Use as a backend service for search and content extraction APIs

⚑ High-Performance Search

Leverage API providers for faster, more reliable search results

πŸ›‘οΈ Enterprise Search Solutions

Deploy with proxy support and multiple API providers for enterprise environments

SupportΒΆ

LicenseΒΆ

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Indices and tablesΒΆ