-
-
Notifications
You must be signed in to change notification settings - Fork 31
HTTP Client Issue: jsdelivr CDN URLs Fail with "Content-Length Header missing from response" #569
Description
Hi Jon, I tried to decipher why vegafusion cannot parse the urls in the examples mentioned in this PR vega/altair#3859 (comment). It seems to be related to jsdelivr as CDN being.
The remaining of the issue description was auto-generated:
VegaFusion's HTTP client (using object_store crate) fails to access jsdelivr CDN URLs with the error Generic HTTP error: Content-Length Header missing from response. This affects users trying to load vega-datasets from jsdelivr URLs.
Problem Description
When VegaFusion attempts to load data from jsdelivr CDN URLs (e.g., https://cdn.jsdelivr.net/npm/vega-datasets@v3.2.0/data/stocks.csv), it fails with:
DataFusion error: Object Store error: Generic HTTP error: Content-Length Header missing from response
However, the same URLs work fine with other HTTP clients (reqwest, curl, etc.) and other CDNs (unpkg, GitHub raw) work correctly with VegaFusion.
Minimal Reproduction
I've created a minimal reproduction at: https://github.com/mattijn/vegafusion-http-test/tree/main
git clone https://github.com/mattijn/vegafusion-http-test/tree/main
cd vegafusion-http-test
cargo runExpected output:
=== VegaFusion HTTP Client Issue Reproduction ===
Testing jsdelivr CDN: https://cdn.jsdelivr.net/npm/vega-datasets@v3.2.0/data/stocks.csv
❌ Failed with error: Generic HTTP error: Content-Length Header missing from response
Testing unpkg CDN: https://unpkg.com/vega-datasets@3.2.0/data/stocks.csv
✅ Success! Content length: 12245 bytes
Testing github-raw CDN: https://raw.githubusercontent.com/vega/vega-datasets/main/data/stocks.csv
✅ Success! Content length: 12245 bytes
Root Cause
The issue is in the object_store crate's HTTP client implementation:
- jsdelivr CDN doesn't provide a
Content-Lengthheader in the format thatobject_storeexpects - Other CDNs (unpkg, GitHub raw) provide the header correctly
- Other HTTP clients (reqwest, curl, etc.) work fine with jsdelivr
Impact
- Users cannot load vega-datasets from jsdelivr URLs
- This affects any VegaFusion application that relies on jsdelivr CDN
- Workaround requires URL rewriting or using alternative CDNs
Suggested Fix
Option 1: URL Rewriting (Recommended)
Add URL rewriting logic in VegaFusion to automatically convert jsdelivr URLs to unpkg URLs:
// In vegafusion-http/src/object_store.rs or similar
fn rewrite_jsdelivr_url(url: &str) -> String {
if url.contains("cdn.jsdelivr.net") {
url.replace("cdn.jsdelivr.net", "unpkg.com")
} else {
url.to_string()
}
}Option 2: Enhanced HTTP Client Configuration
Configure the object_store HTTP client to be more lenient with missing headers:
let object_store = object_store::http::HttpBuilder::new()
.with_url(url)
.with_allow_invalid_certificates(true) // If needed
.with_allow_http(true)
.build()?;Option 3: Fallback HTTP Client
Implement a fallback mechanism that tries object_store first, then falls back to reqwest for jsdelivr URLs:
async fn get_data_with_fallback(url: &str) -> Result<Vec<u8>> {
// Try object_store first
match get_with_object_store(url).await {
Ok(data) => Ok(data),
Err(e) if e.to_string().contains("Content-Length") => {
// Fallback to reqwest for jsdelivr URLs
get_with_reqwest(url).await
}
Err(e) => Err(e),
}
}Environment
- VegaFusion version: Latest (using
object_store0.12.3) - Rust version: 1.88.0
- Platform: macOS (aarch64), but likely affects all platforms
- CDN: jsdelivr.net
Related Issues
- This is likely related to the
object_storecrate's HTTP client implementation - Similar issues may exist with other CDNs that don't provide expected headers
- vega-datasets uses jsdelivr as their primary CDN
Workaround
Until this is fixed, users can:
-
Replace jsdelivr URLs with unpkg URLs:
# Instead of: url = "https://cdn.jsdelivr.net/npm/vega-datasets@v3.2.0/data/stocks.csv" # Use: url = "https://unpkg.com/vega-datasets@3.2.0/data/stocks.csv"
-
Use GitHub raw URLs:
url = "https://raw.githubusercontent.com/vega/vega-datasets/main/data/stocks.csv"
-
Download files locally and use file:// URLs
Additional Context
This issue was discovered while working with Altair and vega-datasets. The vega-datasets package uses jsdelivr as their primary CDN, which makes this a common issue for VegaFusion users.