Synchronization Methods¶

GitBridge provides two methods for synchronizing GitHub repositories, ensuring you can always access your code regardless of network restrictions.

Overview¶

Method	Speed	Requirements	Use Case
API	Fast ⚡	GitHub token (for private repos)	Default method, most efficient
Browser	Slow 🐢	Modern browser (Chrome/Firefox/Edge)	When API access is blocked

API Synchronization Method¶

The API method uses GitHub's REST API to efficiently sync repositories.

How It Works¶

sequenceDiagram
    participant GitBridge
    participant GitHub API
    participant Local Files

    GitBridge->>GitHub API: Request repository tree
    GitHub API-->>GitBridge: Return file list with SHAs
    GitBridge->>Local Files: Compare with cached SHAs
    GitBridge->>GitHub API: Request changed files only
    GitHub API-->>GitBridge: Return file contents
    GitBridge->>Local Files: Save updated files
    GitBridge->>Local Files: Update metadata cache

Features¶

Incremental Updates: Only downloads changed files
Parallel Downloads: Multiple files downloaded simultaneously
SHA Verification: Ensures file integrity
Efficient for Large Repos: Minimal bandwidth usage
Branch/Tag Support: Sync any ref (branch, tag, commit)

Configuration¶

Command LineConfiguration File

Bash
gitbridge sync \
  --repo https://github.com/user/repo \
  --local ~/projects/repo \
  --method api \
  --token YOUR_TOKEN

YAML
sync:
  method: api
  incremental: true
  parallel_downloads: 5
  chunk_size: 1048576  # 1MB chunks

auth:
  token: ${GITHUB_TOKEN}

API Rate Limits¶

GitHub API has rate limits:

Unauthenticated: 60 requests/hour
Authenticated: 5,000 requests/hour
GitHub Enterprise: Varies by installation

Avoiding Rate Limits

Always use authentication token
Enable incremental mode
Cache metadata between syncs
Use --verbose to monitor API usage

Advanced Options¶

YAML
sync:
  method: api

  # Performance tuning
  parallel_downloads: 10      # Number of concurrent downloads
  chunk_size: 2097152         # 2MB chunks for large files
  retry_count: 3              # Retry failed downloads
  retry_delay: 5              # Seconds between retries

  # Filtering
  ignore_patterns:            # Files to skip
    - "*.log"
    - ".DS_Store"
    - "node_modules/"

  # Large files
  skip_large_files: true      # Skip files > 100MB
  large_file_size: 104857600  # 100MB threshold

Browser Automation Method¶

The browser method uses Playwright to automate a real browser, mimicking manual repository browsing.

How It Works¶

sequenceDiagram
    participant GitBridge
    participant Chrome Browser
    participant GitHub Website
    participant Local Files

    GitBridge->>Browser: Launch browser (Playwright)
    Browser->>GitHub Website: Navigate to repository
    GitBridge->>Browser: Click "Download ZIP"
    Browser->>GitHub Website: Request ZIP file
    GitHub Website-->>Browser: Return ZIP data
    GitBridge->>Browser: Extract file list from ZIP
    GitBridge->>Local Files: Compare with existing files
    GitBridge->>Browser: Download changed files
    GitBridge->>Local Files: Save updated files

Features¶

Works Anywhere: If you can browse GitHub, it works
No API Token Required: Uses browser session
Handles JavaScript: Works with dynamic content
Cookie Support: Can use existing browser sessions
Proxy Aware: Uses browser's proxy settings

Configuration¶

Command LineConfiguration File

Bash
gitbridge sync \
  --repo https://github.com/user/repo \
  --local ~/projects/repo \
  --method browser \
  --browser-path /usr/bin/chromium

YAML
sync:
  method: browser

browser:
  type: chromium             # chromium, firefox, or webkit
  executable_path: /usr/bin/chromium  # Optional: custom browser path
  headless: true            # Run without GUI
  timeout: 30000            # Page load timeout (ms)
  download_timeout: 300000  # File download timeout (ms)

Browser Setup¶

Browser Installation¶

Playwright can automatically install browsers for you:

Automatic InstallationManual Installation

Bash
# Install Playwright browsers (recommended)
playwright install chromium
playwright install firefox
playwright install webkit

# Or install all browsers
playwright install

Bash
# Windows (using winget)
winget install Google.Chrome
winget install Mozilla.Firefox

# macOS (using Homebrew)
brew install --cask google-chrome
brew install --cask firefox

# Linux (Ubuntu/Debian)
sudo apt-get install chromium-browser
sudo apt-get install firefox

Browser Management¶

Playwright manages browser binaries automatically. No separate driver installation needed!

Advanced Options¶

YAML
sync:
  method: browser

browser:
  type: chromium              # chromium, firefox, or webkit
  headless: true              # Run without GUI (faster)

  # Window settings
  window_size: "1920x1080"    # Browser window size
  device_scale_factor: 1      # Device pixel ratio

  # Performance
  args:
    - "--disable-gpu"         # Disable GPU acceleration
    - "--no-sandbox"          # Required for some environments
    - "--disable-dev-shm-usage"  # Overcome limited resource problems

  # Timeouts (in milliseconds)
  timeout: 30000              # Default timeout for operations
  download_timeout: 300000    # File download timeout

  # Browser context options
  user_agent: "GitBridge/1.0"   # Custom user agent
  locale: "en-US"             # Browser locale
  timezone: "America/New_York" # Browser timezone

  # Storage
  user_data_dir: ~/.gitbridge/browser  # Persistent browser data

  # Proxy (if not using system proxy)
  proxy:
    server: "http://proxy:8080"
    username: "proxy_user"
    password: "proxy_pass"
    bypass: "localhost,127.0.0.1"

Handling Authentication¶

For private repositories with browser method:

Manual Login (Recommended for first use):

Bash
# Run in non-headless mode
gitbridge sync --method browser --no-headless
# Log in manually when browser opens
# Cookies will be saved for future use

Cookie Reuse:

YAML
browser:
  user_data_dir: ~/.gitbridge/browser  # Saves session data
  # Browser will reuse saved cookies automatically

Basic Auth (if supported):

YAML
auth:
  username: your_username
  password: your_password  # Or use token as password

Choosing the Right Method¶

Use API Method When:¶

✅ You have network access to api.github.com
✅ You need fast synchronization
✅ You're syncing large repositories
✅ You need incremental updates
✅ You're automating syncs

Use Browser Method When:¶

✅ API access is blocked by firewall
✅ Only browser access to GitHub works
✅ You need to handle complex authentication
✅ You're syncing occasionally (not time-critical)
✅ API rate limits are exhausted

Method Comparison¶

Performance Comparison¶

Aspect	API Method	Browser Method
Initial sync (1GB repo)	~2 minutes	~15 minutes
Incremental update	~5 seconds	~2 minutes
Memory usage	Low (50MB)	High (500MB+)
CPU usage	Low	Medium-High
Network efficiency	Excellent	Good

Feature Comparison¶

Feature	API Method	Browser Method
Incremental updates	✅ Yes	✅ Yes
Parallel downloads	✅ Yes	❌ No
Large file support	⚠️ Limited	✅ Yes
Authentication	Token	Browser session
Proxy support	✅ Yes	✅ Yes
Rate limiting	Yes (5k/hour)	No
JavaScript required	❌ No	✅ Yes

Fallback Strategy¶

Configure automatic fallback from API to browser method:

YAML
sync:
  method: auto  # Try API first, fallback to browser
  fallback_on_error: true

  # Primary method (API)
  api:
    retry_count: 2
    timeout: 30

  # Fallback method (Browser)  
  browser:
    headless: true
    timeout: 60

Implementation in code:

Python
from gitbridge.api_sync import GitHubAPISync
from gitbridge.browser_sync import GitHubBrowserSync

def sync_with_fallback(repo_url, local_path, token=None):
    """Sync with automatic fallback to browser method."""

    # Try API method first
    try:
        api_sync = GitHubAPISync(repo_url, local_path, token)
        if api_sync.test_connection():
            return api_sync.sync()
    except Exception as e:
        print(f"API sync failed: {e}")
        print("Falling back to browser method...")

        # Fallback to browser method
        browser_sync = GitHubBrowserSync(
            repo_url=repo_url,
            local_path=local_path,
            browser_type="chromium",
            headless=True
        )
        return browser_sync.sync()

Troubleshooting¶

API Method Issues¶

403 Forbidden

Check your token permissions
Verify token hasn't expired
Check API rate limits

Connection Timeout

Check firewall rules for api.github.com
Try using proxy configuration
Verify network connectivity

Browser Method Issues¶

Browser Not Found

Run playwright install chromium
Or specify executable_path explicitly
Check Playwright installation

Timeout Errors

Increase timeout values
Check network speed
Try non-headless mode for debugging

Best Practices¶

Start with API method - It's faster and more efficient
Use browser as fallback - Only when API is blocked
Enable incremental mode - Save bandwidth and time
Cache credentials securely - Use environment variables or secure stores
Monitor rate limits - Use --verbose to track API usage
Test both methods - Ensure fallback works before you need it

Next Steps¶

Learn about Authentication options
Configure for Corporate Environments
Set up Proxy Configuration
Understand Incremental Sync mechanics