Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia doesn't like Potatoes #626

Closed
yurivish opened this issue Nov 25, 2020 · 21 comments
Closed

Julia doesn't like Potatoes #626

yurivish opened this issue Nov 25, 2020 · 21 comments

Comments

@yurivish
Copy link

yurivish commented Nov 25, 2020

HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes")

returns an error while

HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potato")

does not. The difference appears to be that the former URL serves a redirect.

The error is

ArgumentError: merge(::HTTP.URIs.URI; scheme::SubString{String}, userinfo::SubString{String}, host::SubString{String}, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String}) requires !(scheme in ["http", "https"]) || (isempty(path) || path[1] == '/')

with stack trace

macro expansion@debug.jl:52[inlined]
#merge#4(::SubString{String}, ::SubString{String}, ::SubString{String}, ::SubString{String}, ::SubString{String}, ::SubString{String}, ::SubString{String}, ::typeof(merge), ::HTTP.URIs.URI)@URIs.jl:81
absuri@URIs.jl:421[inlined]
absuri(::SubString{String}, ::HTTP.URIs.URI)@URIs.jl:409
#request#1(::Int64, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(HTTP.request), ::Type{HTTP.RedirectRequest.RedirectLayer{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1})@RedirectRequest.jl:35
request(::Type{HTTP.RedirectRequest.RedirectLayer{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1})@RedirectRequest.jl:21
#request#4(::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1}, ::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(HTTP.request), ::String, ::String, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1})@HTTP.jl:314
request@HTTP.jl:314[inlined]
#get#12@HTTP.jl:391[inlined]
get(::String)@HTTP.jl:391

The same error occurs with redirect=true:

HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes", redirect=true)

This is with Julia v1.5.3 and HTTP v0.8.19. Related: JuliaLang/julia#3721

@DilumAluthge
Copy link
Member

I can reproduce this with Julia master and HTTP#master:

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes")
ERROR: ArgumentError: backtrace() requires !(scheme in ["http", "https"]) || (isempty(path) || path[1] == '/')
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/URIs/1jrj1/src/debug.jl:52 [inlined]
  [2] URIs.URI(uri::URIs.URI; scheme::SubString{String}, userinfo::SubString{String}, host::SubString{String}, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String})
    @ URIs ~/.julia/packages/URIs/1jrj1/src/URIs.jl:71
  [3] absuri
    @ ~/.julia/packages/URIs/1jrj1/src/URIs.jl:458 [inlined]
  [4] absuri(u::SubString{String}, context::URIs.URI)
    @ URIs ~/.julia/packages/URIs/1jrj1/src/URIs.jl:440
  [5] request(::Type{HTTP.RedirectRequest.RedirectLayer{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}}}, method::String, url::URIs.URI, headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8}; redirect_limit::Int64, forwardheaders::Bool, kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ HTTP.RedirectRequest ~/.julia/packages/HTTP/l3rwh/src/RedirectRequest.jl:35
  [6] request(::Type{HTTP.RedirectRequest.RedirectLayer{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}}}, method::String, url::URIs.URI, headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8})
    @ HTTP.RedirectRequest ~/.julia/packages/HTTP/l3rwh/src/RedirectRequest.jl:21
  [7] request(method::String, url::String, h::Vector{Pair{SubString{String}, SubString{String}}}, b::Vector{UInt8}; headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8}, query::Nothing, kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ HTTP ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:313
  [8] request
    @ ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:313 [inlined]
  [9] #get#13
    @ ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:390 [inlined]
 [10] get(a::String)
    @ HTTP ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:390
 [11] top-level scope
    @ REPL[5]:1
Full details (click to expand):
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.1581 (2020-11-26)
 _/ |\__'_|_|_|\__'_|  |  Commit 377aa809eb* (0 days old master)
|__/                   |

julia> import HTTP

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potato");

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potato"; redirect = false);

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potato"; redirect = true);

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes")
ERROR: ArgumentError: backtrace() requires !(scheme in ["http", "https"]) || (isempty(path) || path[1] == '/')
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/URIs/1jrj1/src/debug.jl:52 [inlined]
  [2] URIs.URI(uri::URIs.URI; scheme::SubString{String}, userinfo::SubString{String}, host::SubString{String}, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String})
    @ URIs ~/.julia/packages/URIs/1jrj1/src/URIs.jl:71
  [3] absuri
    @ ~/.julia/packages/URIs/1jrj1/src/URIs.jl:458 [inlined]
  [4] absuri(u::SubString{String}, context::URIs.URI)
    @ URIs ~/.julia/packages/URIs/1jrj1/src/URIs.jl:440
  [5] request(::Type{HTTP.RedirectRequest.RedirectLayer{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}}}, method::String, url::URIs.URI, headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8}; redirect_limit::Int64, forwardheaders::Bool, kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ HTTP.RedirectRequest ~/.julia/packages/HTTP/l3rwh/src/RedirectRequest.jl:35
  [6] request(::Type{HTTP.RedirectRequest.RedirectLayer{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}}}, method::String, url::URIs.URI, headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8})
    @ HTTP.RedirectRequest ~/.julia/packages/HTTP/l3rwh/src/RedirectRequest.jl:21
  [7] request(method::String, url::String, h::Vector{Pair{SubString{String}, SubString{String}}}, b::Vector{UInt8}; headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8}, query::Nothing, kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ HTTP ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:313
  [8] request
    @ ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:313 [inlined]
  [9] #get#13
    @ ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:390 [inlined]
 [10] get(a::String)
    @ HTTP ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:390
 [11] top-level scope
    @ REPL[5]:1

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes"; redirect = false)
HTTP.Messages.Response:
"""
HTTP/1.1 302 Moved Temporarily
etag: "784030667/d111d5f0-0795-11ea-a815-018920c57deb"
location: Potato
cache-control: s-maxage=1209600, max-age=300
vary: origin,X-Forwarded-Proto
access-control-allow-origin: *
access-control-allow-methods: GET,HEAD
access-control-allow-headers: accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding
access-control-expose-headers: etag
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
referrer-policy: origin-when-cross-origin
x-xss-protection: 1; mode=block
content-security-policy: default-src 'none'; frame-ancestors 'none'
x-content-security-policy: default-src 'none'; frame-ancestors 'none'
x-webkit-csp: default-src 'none'; frame-ancestors 'none'
x-request-id: 73f8d6df-9c05-424f-9bca-c2c355266f24
server: restbase2019
date: Thu, 26 Nov 2020 15:15:59 GMT
Age: 29008
X-Cache: cp2029 miss, cp2039 hit/28
X-Cache-Status: hit-front
Server-Timing: cache;desc="hit-front"
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
Report-To: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
NEL: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
Set-Cookie: WMF-Last-Access=26-Nov-2020;Path=/;HttpOnly;secure;Expires=Mon, 28 Dec 2020 12:00:00 GMT
Set-Cookie: WMF-Last-Access-Global=26-Nov-2020;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Mon, 28 Dec 2020 12:00:00 GMT
X-Client-IP: 96.253.39.8
Set-Cookie: GeoIP=US:RI:Providence:41.83:-71.40:v4; Path=/; secure; Domain=.wikipedia.org
Content-Length: 0
Connection: keep-alive

"""

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes"; redirect = true)
ERROR: ArgumentError: backtrace() requires !(scheme in ["http", "https"]) || (isempty(path) || path[1] == '/')
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/URIs/1jrj1/src/debug.jl:52 [inlined]
 [2] URIs.URI(uri::URIs.URI; scheme::SubString{String}, userinfo::SubString{String}, host::SubString{String}, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String})
   @ URIs ~/.julia/packages/URIs/1jrj1/src/URIs.jl:71
 [3] absuri
   @ ~/.julia/packages/URIs/1jrj1/src/URIs.jl:458 [inlined]
 [4] absuri(u::SubString{String}, context::URIs.URI)
   @ URIs ~/.julia/packages/URIs/1jrj1/src/URIs.jl:440
 [5] request(::Type{HTTP.RedirectRequest.RedirectLayer{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}}}, method::String, url::URIs.URI, headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8}; redirect_limit::Int64, forwardheaders::Bool, kw::Base.Iterators.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:redirect,), Tuple{Bool}}})
   @ HTTP.RedirectRequest ~/.julia/packages/HTTP/l3rwh/src/RedirectRequest.jl:35
 [6] request(method::String, url::String, h::Vector{Pair{SubString{String}, SubString{String}}}, b::Vector{UInt8}; headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8}, query::Nothing, kw::Base.Iterators.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:redirect,), Tuple{Bool}}})
   @ HTTP ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:313
 [7] #get#13
   @ ~/.julia/packages/HTTP/l3rwh/src/HTTP.jl:390 [inlined]
 [8] top-level scope
   @ REPL[7]:1

julia> versioninfo()
Julia Version 1.6.0-DEV.1581
Commit 377aa809eb* (2020-11-26 01:44 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, haswell)
Environment:
  JULIA_PKG_SERVER =

(@v1.6) pkg> st
Status `~/.julia/environments/v1.6/Project.toml`
  [cd3eb016] HTTP v0.9.0 `https://github.com/JuliaWeb/HTTP.jl.git#master`

(@v1.6) pkg> st -m
Status `~/.julia/environments/v1.6/Manifest.toml`
  [cd3eb016] HTTP v0.9.0 `https://github.com/JuliaWeb/HTTP.jl.git#master`
  [83e8ac13] IniFile v0.5.0
  [692b3bcd] JLLWrappers v1.1.3
  [739be429] MbedTLS v1.0.3
  [5c2747f8] URIs v1.1.0
  [c8ffd9c3] MbedTLS_jll v2.24.0+1
  [0dad84c5] ArgTools
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [f43a241f] Downloads
  [b77e0a4c] InteractiveUtils
  [b27032c2] LibCURL
  [76f85450] LibGit2
  [8f399da3] Libdl
  [56ddb016] Logging
  [d6f4376e] Markdown
  [ca575930] NetworkOptions
  [44cfe95a] Pkg
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA
  [9e88b42a] Serialization
  [6462fe0b] Sockets
  [fa267f1f] TOML
  [a4e569a6] Tar
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [deac9b47] LibCURL_jll
  [14a3606d] MozillaCACerts_jll

@DilumAluthge
Copy link
Member

I'm not sure if this is a bug in HTTP.jl or URIs.jl.

@DilumAluthge
Copy link
Member

I can also reproduce this with Julia master, HTTP#master, and URIs#master.

@DilumAluthge
Copy link
Member

DilumAluthge commented Nov 26, 2020

So, I'm starting to think that this is actually a bug in the Wikipedia API, not a problem with HTTP.jl or URIs.jl.

To see this, let's look at the output of HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes"; redirect=false):

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes"; redirect=false)
HTTP.Messages.Response:
"""
HTTP/1.1 302 Moved Temporarily
etag: "784030667/d111d5f0-0795-11ea-a815-018920c57deb"
location: Potato
cache-control: s-maxage=1209600, max-age=300
vary: origin,X-Forwarded-Proto
access-control-allow-origin: *
access-control-allow-methods: GET,HEAD
access-control-allow-headers: accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding
access-control-expose-headers: etag
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
referrer-policy: origin-when-cross-origin
x-xss-protection: 1; mode=block
content-security-policy: default-src 'none'; frame-ancestors 'none'
x-content-security-policy: default-src 'none'; frame-ancestors 'none'
x-webkit-csp: default-src 'none'; frame-ancestors 'none'
x-request-id: 73f8d6df-9c05-424f-9bca-c2c355266f24
server: restbase2019
date: Thu, 26 Nov 2020 15:15:59 GMT
Age: 30337
X-Cache: cp2029 miss, cp2039 hit/51
X-Cache-Status: hit-front
Server-Timing: cache;desc="hit-front"
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
Report-To: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
NEL: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
Set-Cookie: WMF-Last-Access=26-Nov-2020;Path=/;HttpOnly;secure;Expires=Mon, 28 Dec 2020 12:00:00 GMT
Set-Cookie: WMF-Last-Access-Global=26-Nov-2020;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Mon, 28 Dec 2020 12:00:00 GMT
X-Client-IP: 96.253.39.8
Set-Cookie: GeoIP=US:RI:Providence:41.83:-71.40:v4; Path=/; secure; Domain=.wikipedia.org
Content-Length: 0
Connection: keep-alive

"""

The important part is this line:

location: Potato

According to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Location, the location header has this form:

location: <url>

Where <url> must be:

A relative (to the request URL) or absolute URL.

In this example, Wikipedia returns Potato as the value for <url>, so HTTP.jl/URIs.jl assume that Potato is either a relative or absolute URL. The problem is that Potato is clearly not a relative or absolute URL. So I think that Wikipedia is at fault here, for not providing a valid relative or absolute URL as the value of the location header.

@DilumAluthge
Copy link
Member

I've written some code to address your specific use case.

The code: (click to expand)
import HTTP

function _getindex(v::AbstractVector{<:Pair}, key)
    return v[findfirst(x -> x[1] == key, v)][2]
end

function _add_trailing_slash(s::AbstractString)::String
    if endswith(s, "/")
        return s
    else 
        return string(s, "/")
    end
end

function get_wikipedia_page_name(endpoint::AbstractString, page_name::AbstractString)
    url = string(_add_trailing_slash(endpoint), page_name)
    response = HTTP.head(url; redirect=false)
    if response.status == 302
        redirect_page_name = _getindex(response.headers, "location")
        return redirect_page_name
    else 
        return page_name
    end
end

function make_wikipedia_request(endpoint::AbstractString, page_name::AbstractString)
    actual_page_name = get_wikipedia_page_name(endpoint, page_name)
    url = string(_add_trailing_slash(endpoint), actual_page_name)
    response = HTTP.get(url)
    return response
end

Example usage:

julia> get_wikipedia_page_name("https://en.wikipedia.org/api/rest_v1/page/summary", "Potato")
"Potato"

julia> get_wikipedia_page_name("https://en.wikipedia.org/api/rest_v1/page/summary", "Potatoes")
"Potato"

julia> make_wikipedia_request("https://en.wikipedia.org/api/rest_v1/page/summary", "Potato")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
content-type: application/json; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/Summary/1.5.0"
cache-control: s-maxage=1209600, max-age=300
content-language: en
vary: Accept-Encoding
content-location: https://en.wikipedia.org/api/rest_v1/page/summary/Potato
access-control-allow-origin: *
access-control-allow-methods: GET,HEAD
access-control-allow-headers: accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding
access-control-expose-headers: etag
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
referrer-policy: origin-when-cross-origin
x-xss-protection: 1; mode=block
content-security-policy: default-src 'none'; frame-ancestors 'none'
x-content-security-policy: default-src 'none'; frame-ancestors 'none'
x-webkit-csp: default-src 'none'; frame-ancestors 'none'
x-request-id: ba8bd65a-c2e9-4019-9873-7b55b27be6d8
server: restbase1030
date: Thu, 26 Nov 2020 22:09:55 GMT
ETag: W/"990856045/133d9e10-3034-11eb-ab69-1df7aab4ce4d"
Age: 6018
X-Cache: cp1075 hit, cp1075 hit/54
X-Cache-Status: hit-front
Server-Timing: cache;desc="hit-front"
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
Report-To: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
NEL: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
Set-Cookie: WMF-Last-Access=26-Nov-2020;Path=/;HttpOnly;secure;Expires=Mon, 28 Dec 2020 12:00:00 GMT
Set-Cookie: WMF-Last-Access-Global=26-Nov-2020;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Mon, 28 Dec 2020 12:00:00 GMT
X-Client-IP: 96.253.39.8
Set-Cookie: GeoIP=US:RI:Providence:41.83:-71.40:v4; Path=/; secure; Domain=.wikipedia.org
Accept-Ranges: bytes
Content-Length: 1592
Connection: keep-alive

{"type":"standard","title":"Potato","displaytitle":"Potato","namespace":{"id":0,"text":""},"wikibase_item":"Q10998","titles":{"canonical":"Potato","normalized":"Potato","display":"Potato"},"pageid":23501,"thumbnail":{"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/a/ab/Patates.jpg/320px-Patates.jpg","width":320,"height":208},"originalimage":{"source":"https://upload.wikimedia.org/wikipedia/commons/a/ab/Patates.jpg","width":2864,"height":1861},"lang":"en","dir":"ltr","revision":"990856045","tid":"13063a60-3034-11eb-8573-79ce47bbbc76","timestamp":"2020-11-26T22:09:25Z","description":"Plant species producing the tuber used as a staple food","description_source":"local","content_urls":{"desktop":{"page":"https://en.wikipedia.org/wiki/Potato","revisions":"https://en.wikipedia.org/wiki/Potato?action=history","edit":"https://en.wikipedia.org/wiki/Potato?action=edit","talk":"https://en.wikipedia.org/wiki/Talk:Potato"},"mobile":{"page":"https://en.m.wikipedia.org/wiki/Potato","re

1592-byte body
"""

julia> make_wikipedia_request("https://en.wikipedia.org/api/rest_v1/page/summary", "Potatoes")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
content-type: application/json; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/Summary/1.5.0"
cache-control: s-maxage=1209600, max-age=300
content-language: en
vary: Accept-Encoding
content-location: https://en.wikipedia.org/api/rest_v1/page/summary/Potato
access-control-allow-origin: *
access-control-allow-methods: GET,HEAD
access-control-allow-headers: accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding
access-control-expose-headers: etag
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
referrer-policy: origin-when-cross-origin
x-xss-protection: 1; mode=block
content-security-policy: default-src 'none'; frame-ancestors 'none'
x-content-security-policy: default-src 'none'; frame-ancestors 'none'
x-webkit-csp: default-src 'none'; frame-ancestors 'none'
x-request-id: ba8bd65a-c2e9-4019-9873-7b55b27be6d8
server: restbase1030
date: Thu, 26 Nov 2020 22:09:55 GMT
ETag: W/"990856045/133d9e10-3034-11eb-ab69-1df7aab4ce4d"
Age: 6020
X-Cache: cp1075 hit, cp1075 hit/55
X-Cache-Status: hit-front
Server-Timing: cache;desc="hit-front"
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
Report-To: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
NEL: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
Set-Cookie: WMF-Last-Access=26-Nov-2020;Path=/;HttpOnly;secure;Expires=Mon, 28 Dec 2020 12:00:00 GMT
Set-Cookie: WMF-Last-Access-Global=26-Nov-2020;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Mon, 28 Dec 2020 12:00:00 GMT
X-Client-IP: 96.253.39.8
Set-Cookie: GeoIP=US:RI:Providence:41.83:-71.40:v4; Path=/; secure; Domain=.wikipedia.org
Accept-Ranges: bytes
Content-Length: 1592
Connection: keep-alive

{"type":"standard","title":"Potato","displaytitle":"Potato","namespace":{"id":0,"text":""},"wikibase_item":"Q10998","titles":{"canonical":"Potato","normalized":"Potato","display":"Potato"},"pageid":23501,"thumbnail":{"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/a/ab/Patates.jpg/320px-Patates.jpg","width":320,"height":208},"originalimage":{"source":"https://upload.wikimedia.org/wikipedia/commons/a/ab/Patates.jpg","width":2864,"height":1861},"lang":"en","dir":"ltr","revision":"990856045","tid":"13063a60-3034-11eb-8573-79ce47bbbc76","timestamp":"2020-11-26T22:09:25Z","description":"Plant species producing the tuber used as a staple food","description_source":"local","content_urls":{"desktop":{"page":"https://en.wikipedia.org/wiki/Potato","revisions":"https://en.wikipedia.org/wiki/Potato?action=history","edit":"https://en.wikipedia.org/wiki/Potato?action=edit","talk":"https://en.wikipedia.org/wiki/Talk:Potato"},"mobile":{"page":"https://en.m.wikipedia.org/wiki/Potato","re

1592-byte body
"""

@fredrikekre
Copy link
Member

Duplicate of #435.

@yurivish
Copy link
Author

yurivish commented Nov 27, 2020

@DilumAluthge Thanks for the thorough investigation! That's a fun one – I'll see about reporting it to the Wikipedia folks!

Would there be a cost or downside to supporting this behavior in HTTP.jl, since it appears in the wild despite not being standards-compliant?

@DilumAluthge
Copy link
Member

It's possible that I'm wrong. This is just based on my reading of the Mozilla documentation that I linked above.

@fredrikekre
Copy link
Member

Curl and Firefox handles it, so pretty sure HTTP.jl is at fault.

@yurivish
Copy link
Author

yurivish commented Nov 27, 2020

The official spec does say that the value should be an absolute URL: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html (see section "14.30 Location")

Curl and Firefox handles it, so pretty sure HTTP.jl is at fault.

Yeah, I wound up just shelling out to curl.

@DilumAluthge
Copy link
Member

DilumAluthge commented Nov 27, 2020

302 redirects are definitely allowed to give you relative URLs

So the question is: is Potato a valid relative URL?

Certainly /Potato and ./Potato are valid relative URLs.

But is Potato a valid relative URL?

@DilumAluthge
Copy link
Member

Curl and Firefox handles it, so pretty sure HTTP.jl is at fault.

Yeah that is definitely concerning.

@fredrikekre
Copy link
Member

The official spec does say that the value should be an absolute URL: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html (see section "14.30 Location")

RFC 2616 has been replaced according to https://stackoverflow.com/a/25643550/5087136

@yurivish
Copy link
Author

Oh, oops, thanks for catching that.

@DilumAluthge
Copy link
Member

DilumAluthge commented Nov 27, 2020

I think this should be fixed by JuliaWeb/URIs.jl#13

@DilumAluthge
Copy link
Member

@yurivish Can you try checking out JuliaWeb/URIs.jl#13 locally to confirm that it fixes the issue for you?

julia> import Pkg

julia> Pkg.add(Pkg.PackageSpec(url = "https://github.com/DilumAluthge/URIs.jl", rev = "dpa/relative-uris"))

@yurivish
Copy link
Author

yurivish commented Nov 27, 2020

@DilumAluthge Running those two lines before GETing Potatoes still returns the same error (both with and without an explicit redirect=true). Do I need to do something to make HTTP.jl use that specific version of URIs?

]st lists it as

  [5c2747f8] URIs v1.1.0 `https://github.com/DilumAluthge/URIs.jl#dpa/relative-uris`

with no other mentions of the package, but I don't really know how Julia's package management works in much detail.

@DilumAluthge
Copy link
Member

Close and reopen Julia, and then try the GET commands again.

@yurivish
Copy link
Author

yurivish@Compy ~ % julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.3 (2020-11-09)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using HTTP

julia> HTTP.get("https://en.wikipedia.org/api/rest_v1/page/summary/Potatoes", redirect=true)
ERROR: ArgumentError: merge(::HTTP.URIs.URI; scheme::SubString{String}, userinfo::SubString{String}, host::SubString{String}, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String}) requires !(scheme in ["http", "https"]) || (isempty(path) || path[1] == '/')
Stacktrace:
 [1] macro expansion at /Users/yurivish/.julia/packages/HTTP/IAI92/src/debug.jl:52 [inlined]
 [2] merge(::HTTP.URIs.URI; scheme::SubString{String}, userinfo::SubString{String}, host::SubString{String}, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String}) at /Users/yurivish/.julia/packages/HTTP/IAI92/src/URIs.jl:81
 [3] absuri at /Users/yurivish/.julia/packages/HTTP/IAI92/src/URIs.jl:421 [inlined]
 [4] absuri(::SubString{String}, ::HTTP.URIs.URI) at /Users/yurivish/.julia/packages/HTTP/IAI92/src/URIs.jl:409
 [5] request(::Type{RedirectLayer{BasicAuthLayer{MessageLayer{RetryLayer{ExceptionLayer{ConnectionPoolLayer{StreamLayer{Union{}}}}}}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1}; redirect_limit::Int64, forwardheaders::Bool, kw::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:redirect,),Tuple{Bool}}}) at /Users/yurivish/.julia/packages/HTTP/IAI92/src/RedirectRequest.jl:35
 [6] request(::String, ::String, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1}; headers::Array{Pair{SubString{String},SubString{String}},1}, body::Array{UInt8,1}, query::Nothing, kw::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:redirect,),Tuple{Bool}}}) at /Users/yurivish/.julia/packages/HTTP/IAI92/src/HTTP.jl:314
 [7] #get#12 at /Users/yurivish/.julia/packages/HTTP/IAI92/src/HTTP.jl:391 [inlined]
 [8] top-level scope at REPL[2]:1

with

[cd3eb016] HTTP v0.8.19
[5c2747f8] URIs v1.1.0 https://github.com/DilumAluthge/URIs.jl#dpa/relative-uris

@DilumAluthge
Copy link
Member

Make sure you are using HTTP 0.9.0.

@yurivish
Copy link
Author

That worked. I had to uninstall Pluto because it was preventing the upgrade of HTTP to 0.9.0, but it looks like this is fixed with HTTP 0.9.0 and your bugfix to URIs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants