Skip to content

HTTP Client Issue: jsdelivr CDN URLs Fail with "Content-Length Header missing from response" #569

@mattijn

Description

@mattijn

Hi Jon, I tried to decipher why vegafusion cannot parse the urls in the examples mentioned in this PR vega/altair#3859 (comment). It seems to be related to jsdelivr as CDN being.

The remaining of the issue description was auto-generated:


VegaFusion's HTTP client (using object_store crate) fails to access jsdelivr CDN URLs with the error Generic HTTP error: Content-Length Header missing from response. This affects users trying to load vega-datasets from jsdelivr URLs.

Problem Description

When VegaFusion attempts to load data from jsdelivr CDN URLs (e.g., https://cdn.jsdelivr.net/npm/vega-datasets@v3.2.0/data/stocks.csv), it fails with:

DataFusion error: Object Store error: Generic HTTP error: Content-Length Header missing from response

However, the same URLs work fine with other HTTP clients (reqwest, curl, etc.) and other CDNs (unpkg, GitHub raw) work correctly with VegaFusion.

Minimal Reproduction

I've created a minimal reproduction at: https://github.com/mattijn/vegafusion-http-test/tree/main

git clone https://github.com/mattijn/vegafusion-http-test/tree/main
cd vegafusion-http-test
cargo run

Expected output:

=== VegaFusion HTTP Client Issue Reproduction ===

Testing jsdelivr CDN: https://cdn.jsdelivr.net/npm/vega-datasets@v3.2.0/data/stocks.csv
  ❌ Failed with error: Generic HTTP error: Content-Length Header missing from response

Testing unpkg CDN: https://unpkg.com/vega-datasets@3.2.0/data/stocks.csv
  ✅ Success! Content length: 12245 bytes

Testing github-raw CDN: https://raw.githubusercontent.com/vega/vega-datasets/main/data/stocks.csv
  ✅ Success! Content length: 12245 bytes

Root Cause

The issue is in the object_store crate's HTTP client implementation:

  1. jsdelivr CDN doesn't provide a Content-Length header in the format that object_store expects
  2. Other CDNs (unpkg, GitHub raw) provide the header correctly
  3. Other HTTP clients (reqwest, curl, etc.) work fine with jsdelivr

Impact

  • Users cannot load vega-datasets from jsdelivr URLs
  • This affects any VegaFusion application that relies on jsdelivr CDN
  • Workaround requires URL rewriting or using alternative CDNs

Suggested Fix

Option 1: URL Rewriting (Recommended)

Add URL rewriting logic in VegaFusion to automatically convert jsdelivr URLs to unpkg URLs:

// In vegafusion-http/src/object_store.rs or similar
fn rewrite_jsdelivr_url(url: &str) -> String {
    if url.contains("cdn.jsdelivr.net") {
        url.replace("cdn.jsdelivr.net", "unpkg.com")
    } else {
        url.to_string()
    }
}

Option 2: Enhanced HTTP Client Configuration

Configure the object_store HTTP client to be more lenient with missing headers:

let object_store = object_store::http::HttpBuilder::new()
    .with_url(url)
    .with_allow_invalid_certificates(true) // If needed
    .with_allow_http(true)
    .build()?;

Option 3: Fallback HTTP Client

Implement a fallback mechanism that tries object_store first, then falls back to reqwest for jsdelivr URLs:

async fn get_data_with_fallback(url: &str) -> Result<Vec<u8>> {
    // Try object_store first
    match get_with_object_store(url).await {
        Ok(data) => Ok(data),
        Err(e) if e.to_string().contains("Content-Length") => {
            // Fallback to reqwest for jsdelivr URLs
            get_with_reqwest(url).await
        }
        Err(e) => Err(e),
    }
}

Environment

  • VegaFusion version: Latest (using object_store 0.12.3)
  • Rust version: 1.88.0
  • Platform: macOS (aarch64), but likely affects all platforms
  • CDN: jsdelivr.net

Related Issues

  • This is likely related to the object_store crate's HTTP client implementation
  • Similar issues may exist with other CDNs that don't provide expected headers
  • vega-datasets uses jsdelivr as their primary CDN

Workaround

Until this is fixed, users can:

  1. Replace jsdelivr URLs with unpkg URLs:

    # Instead of:
    url = "https://cdn.jsdelivr.net/npm/vega-datasets@v3.2.0/data/stocks.csv"
    
    # Use:
    url = "https://unpkg.com/vega-datasets@3.2.0/data/stocks.csv"
  2. Use GitHub raw URLs:

    url = "https://raw.githubusercontent.com/vega/vega-datasets/main/data/stocks.csv"
  3. Download files locally and use file:// URLs

Additional Context

This issue was discovered while working with Altair and vega-datasets. The vega-datasets package uses jsdelivr as their primary CDN, which makes this a common issue for VegaFusion users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions