Skip to content

Synchronization Methods

GitBridge provides two methods for synchronizing GitHub repositories, ensuring you can always access your code regardless of network restrictions.

Overview

Method Speed Requirements Use Case
API Fast ⚡ GitHub token (for private repos) Default method, most efficient
Browser Slow 🐢 Modern browser (Chrome/Firefox/Edge) When API access is blocked

API Synchronization Method

The API method uses GitHub's REST API to efficiently sync repositories.

How It Works

sequenceDiagram
    participant GitBridge
    participant GitHub API
    participant Local Files

    GitBridge->>GitHub API: Request repository tree
    GitHub API-->>GitBridge: Return file list with SHAs
    GitBridge->>Local Files: Compare with cached SHAs
    GitBridge->>GitHub API: Request changed files only
    GitHub API-->>GitBridge: Return file contents
    GitBridge->>Local Files: Save updated files
    GitBridge->>Local Files: Update metadata cache

Features

  • Incremental Updates: Only downloads changed files
  • Parallel Downloads: Multiple files downloaded simultaneously
  • SHA Verification: Ensures file integrity
  • Efficient for Large Repos: Minimal bandwidth usage
  • Branch/Tag Support: Sync any ref (branch, tag, commit)

Configuration

Bash
1
2
3
4
5
gitbridge sync \
  --repo https://github.com/user/repo \
  --local ~/projects/repo \
  --method api \
  --token YOUR_TOKEN
YAML
1
2
3
4
5
6
7
8
sync:
  method: api
  incremental: true
  parallel_downloads: 5
  chunk_size: 1048576  # 1MB chunks

auth:
  token: ${GITHUB_TOKEN}

API Rate Limits

GitHub API has rate limits:

  • Unauthenticated: 60 requests/hour
  • Authenticated: 5,000 requests/hour
  • GitHub Enterprise: Varies by installation

Avoiding Rate Limits

  • Always use authentication token
  • Enable incremental mode
  • Cache metadata between syncs
  • Use --verbose to monitor API usage

Advanced Options

YAML
sync:
  method: api

  # Performance tuning
  parallel_downloads: 10      # Number of concurrent downloads
  chunk_size: 2097152         # 2MB chunks for large files
  retry_count: 3              # Retry failed downloads
  retry_delay: 5              # Seconds between retries

  # Filtering
  ignore_patterns:            # Files to skip
    - "*.log"
    - ".DS_Store"
    - "node_modules/"

  # Large files
  skip_large_files: true      # Skip files > 100MB
  large_file_size: 104857600  # 100MB threshold

Browser Automation Method

The browser method uses Playwright to automate a real browser, mimicking manual repository browsing.

How It Works

sequenceDiagram
    participant GitBridge
    participant Chrome Browser
    participant GitHub Website
    participant Local Files

    GitBridge->>Browser: Launch browser (Playwright)
    Browser->>GitHub Website: Navigate to repository
    GitBridge->>Browser: Click "Download ZIP"
    Browser->>GitHub Website: Request ZIP file
    GitHub Website-->>Browser: Return ZIP data
    GitBridge->>Browser: Extract file list from ZIP
    GitBridge->>Local Files: Compare with existing files
    GitBridge->>Browser: Download changed files
    GitBridge->>Local Files: Save updated files

Features

  • Works Anywhere: If you can browse GitHub, it works
  • No API Token Required: Uses browser session
  • Handles JavaScript: Works with dynamic content
  • Cookie Support: Can use existing browser sessions
  • Proxy Aware: Uses browser's proxy settings

Configuration

Bash
1
2
3
4
5
gitbridge sync \
  --repo https://github.com/user/repo \
  --local ~/projects/repo \
  --method browser \
  --browser-path /usr/bin/chromium
YAML
1
2
3
4
5
6
7
8
9
sync:
  method: browser

browser:
  type: chromium             # chromium, firefox, or webkit
  executable_path: /usr/bin/chromium  # Optional: custom browser path
  headless: true            # Run without GUI
  timeout: 30000            # Page load timeout (ms)
  download_timeout: 300000  # File download timeout (ms)

Browser Setup

Browser Installation

Playwright can automatically install browsers for you:

Bash
1
2
3
4
5
6
7
# Install Playwright browsers (recommended)
playwright install chromium
playwright install firefox
playwright install webkit

# Or install all browsers
playwright install
Bash
# Windows (using winget)
winget install Google.Chrome
winget install Mozilla.Firefox

# macOS (using Homebrew)
brew install --cask google-chrome
brew install --cask firefox

# Linux (Ubuntu/Debian)
sudo apt-get install chromium-browser
sudo apt-get install firefox

Browser Management

Playwright manages browser binaries automatically. No separate driver installation needed!

Advanced Options

YAML
sync:
  method: browser

browser:
  type: chromium              # chromium, firefox, or webkit
  headless: true              # Run without GUI (faster)

  # Window settings
  window_size: "1920x1080"    # Browser window size
  device_scale_factor: 1      # Device pixel ratio

  # Performance
  args:
    - "--disable-gpu"         # Disable GPU acceleration
    - "--no-sandbox"          # Required for some environments
    - "--disable-dev-shm-usage"  # Overcome limited resource problems

  # Timeouts (in milliseconds)
  timeout: 30000              # Default timeout for operations
  download_timeout: 300000    # File download timeout

  # Browser context options
  user_agent: "GitBridge/1.0"   # Custom user agent
  locale: "en-US"             # Browser locale
  timezone: "America/New_York" # Browser timezone

  # Storage
  user_data_dir: ~/.gitbridge/browser  # Persistent browser data

  # Proxy (if not using system proxy)
  proxy:
    server: "http://proxy:8080"
    username: "proxy_user"
    password: "proxy_pass"
    bypass: "localhost,127.0.0.1"

Handling Authentication

For private repositories with browser method:

  1. Manual Login (Recommended for first use):

    Bash
    1
    2
    3
    4
    # Run in non-headless mode
    gitbridge sync --method browser --no-headless
    # Log in manually when browser opens
    # Cookies will be saved for future use
    

  2. Cookie Reuse:

    YAML
    1
    2
    3
    browser:
      user_data_dir: ~/.gitbridge/browser  # Saves session data
      # Browser will reuse saved cookies automatically
    

  3. Basic Auth (if supported):

    YAML
    1
    2
    3
    auth:
      username: your_username
      password: your_password  # Or use token as password
    

Choosing the Right Method

Use API Method When:

  • ✅ You have network access to api.github.com
  • ✅ You need fast synchronization
  • ✅ You're syncing large repositories
  • ✅ You need incremental updates
  • ✅ You're automating syncs

Use Browser Method When:

  • ✅ API access is blocked by firewall
  • ✅ Only browser access to GitHub works
  • ✅ You need to handle complex authentication
  • ✅ You're syncing occasionally (not time-critical)
  • ✅ API rate limits are exhausted

Method Comparison

Performance Comparison

Aspect API Method Browser Method
Initial sync (1GB repo) ~2 minutes ~15 minutes
Incremental update ~5 seconds ~2 minutes
Memory usage Low (50MB) High (500MB+)
CPU usage Low Medium-High
Network efficiency Excellent Good

Feature Comparison

Feature API Method Browser Method
Incremental updates ✅ Yes ✅ Yes
Parallel downloads ✅ Yes ❌ No
Large file support ⚠️ Limited ✅ Yes
Authentication Token Browser session
Proxy support ✅ Yes ✅ Yes
Rate limiting Yes (5k/hour) No
JavaScript required ❌ No ✅ Yes

Fallback Strategy

Configure automatic fallback from API to browser method:

YAML
sync:
  method: auto  # Try API first, fallback to browser
  fallback_on_error: true

  # Primary method (API)
  api:
    retry_count: 2
    timeout: 30

  # Fallback method (Browser)  
  browser:
    headless: true
    timeout: 60

Implementation in code:

Python
from gitbridge.api_sync import GitHubAPISync
from gitbridge.browser_sync import GitHubBrowserSync

def sync_with_fallback(repo_url, local_path, token=None):
    """Sync with automatic fallback to browser method."""

    # Try API method first
    try:
        api_sync = GitHubAPISync(repo_url, local_path, token)
        if api_sync.test_connection():
            return api_sync.sync()
    except Exception as e:
        print(f"API sync failed: {e}")
        print("Falling back to browser method...")

        # Fallback to browser method
        browser_sync = GitHubBrowserSync(
            repo_url=repo_url,
            local_path=local_path,
            browser_type="chromium",
            headless=True
        )
        return browser_sync.sync()

Troubleshooting

API Method Issues

403 Forbidden

  • Check your token permissions
  • Verify token hasn't expired
  • Check API rate limits

Connection Timeout

  • Check firewall rules for api.github.com
  • Try using proxy configuration
  • Verify network connectivity

Browser Method Issues

Browser Not Found

  • Run playwright install chromium
  • Or specify executable_path explicitly
  • Check Playwright installation

Timeout Errors

  • Increase timeout values
  • Check network speed
  • Try non-headless mode for debugging

Best Practices

  1. Start with API method - It's faster and more efficient
  2. Use browser as fallback - Only when API is blocked
  3. Enable incremental mode - Save bandwidth and time
  4. Cache credentials securely - Use environment variables or secure stores
  5. Monitor rate limits - Use --verbose to track API usage
  6. Test both methods - Ensure fallback works before you need it

Next Steps