tarzi - Rust-native lite search for AI applicationsΒΆ
tarzi is a powerful, Rust-native search library designed specifically for AI applications. It provides a comprehensive toolkit for content conversion, web fetching, and search engine integration with both browser-based and API-based approaches.
Key FeaturesΒΆ
- π§ Dual Implementation
Native Rust library with Python bindings and CLI tools
- π Content Conversion
Convert raw HTML to Markdown, JSON, or YAML formats
- π Web Fetching
Fetch web pages with optional JavaScript rendering support
- π Search Integration
Query search engines using browser mode (headless/headed/existing) or API mode
- π― Web Search Engines
Support for Bing, Google, DuckDuckGo, Brave Search, and custom engines
- π API Search Providers
Direct API integration with Brave Search, Exa, Travily, and DuckDuckGo
- π Automatic Provider Switching
Smart fallback between API providers for enhanced reliability
- π Proxy Support
Use proxies in both browser-based and API-based operations
- β‘ End-to-End Pipeline
Complete workflow from search queries to content extraction for AI applications
Quick StartΒΆ
PythonΒΆ
pip install tarzi
import tarzi
# Convert HTML to Markdown
markdown = tarzi.convert_html("<h1>Hello</h1>", "markdown")
# Fetch web page
content = tarzi.fetch_url("https://example.com", js=True)
# Search web (browser-based)
results = tarzi.search_web("python programming", "webquery", 10)
# Search using API providers (requires API keys)
results = tarzi.search_web("machine learning", "apiquery", 10)
RustΒΆ
cargo add tarzi
use tarzi::{Converter, WebFetcher, SearchEngine, Format, FetchMode, SearchMode};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Convert HTML to Markdown
let converter = Converter::new();
let markdown = converter.convert("<h1>Hello</h1>", Format::Markdown).await?;
// Fetch web page
let mut fetcher = WebFetcher::new();
let content = fetcher.fetch(
"https://example.com",
FetchMode::BrowserHeadless,
Format::Markdown
).await?;
// Search web (browser-based)
let mut search_engine = SearchEngine::new();
let results = search_engine.search(
"agentic AI",
SearchMode::WebQuery,
5
).await?;
// Search using API providers (requires API keys)
let mut api_search_engine = SearchEngine::from_config(&Config::new());
let api_results = api_search_engine.search(
"machine learning",
SearchMode::ApiQuery,
5
).await?;
Ok(())
}
CLIΒΆ
# Install the CLI tool
cargo install tarzi
# Convert HTML to Markdown
tarzi convert --input "<h1>Hello</h1>" --format markdown
# Fetch web page with JavaScript rendering
tarzi fetch --url "https://example.com" --mode browser_headless --format json
# Search and fetch content (browser-based)
tarzi search-and-fetch \
--query "agentic AI" \
--search-mode webquery \
--fetch-mode plain_request \
--format markdown \
--limit 5
# Search using API providers (requires API keys)
tarzi search-and-fetch \
--query "machine learning" \
--search-mode apiquery \
--fetch-mode plain_request \
--format markdown \
--limit 5
Use CasesΒΆ
- π€ AI Data Collection
Gather and process web content for training data or knowledge bases
- π Research Automation
Automate web research workflows for academic or business intelligence
- π Content Aggregation
Build content aggregation systems that convert web pages to structured data
- π·οΈ Web Scraping Pipelines
Create robust web scraping pipelines with built-in retry logic and format conversion
- π API Development
Use as a backend service for search and content extraction APIs
- β‘ High-Performance Search
Leverage API providers for faster, more reliable search results
- π‘οΈ Enterprise Search Solutions
Deploy with proxy support and multiple API providers for enterprise environments
SupportΒΆ
Documentation: https://tarzirs.readthedocs.io/
Source Code: https://github.com/mirasurf/tarzi.rs
Crates.io: https://crates.io/crates/tarzi
LicenseΒΆ
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.