Jump to content




http not working for google search


  • You cannot reply to this topic
14 replies to this topic

#1 PixelToast

  • Signature Abuser
  • 2,265 posts
  • Location3232235883

Posted 20 November 2012 - 01:23 PM

normally the http api works fine
but
http.request("http://www.google.com/search?q=anything")
always fails instantly, anyone know why? :s

EDIT:
its because of a incompatable user agent sent when connecting to google is preventing me
and cannot be fixed from lua
(unless you use a custom proxy)

#2 Kingdaro

    The Doctor

  • Members
  • 1,636 posts
  • Location'MURICA

Posted 20 November 2012 - 01:39 PM

Try https? (if cc can even use https, that is)

#3 PixelToast

  • Signature Abuser
  • 2,265 posts
  • Location3232235883

Posted 20 November 2012 - 01:42 PM

View PostKingdaro, on 20 November 2012 - 01:39 PM, said:

Try https? (if cc can even use https, that is)
nope, it complains that its not http

the problem could be is that it takes a couple seconds for google to send results, and the http ques http_failure before the message is sent

#4 Lyqyd

    Lua Liquidator

  • Moderators
  • 8,465 posts

Posted 20 November 2012 - 02:00 PM

Are you using http.request correctly? You know it throws an event with the results rather than returning them, correct?

#5 PixelToast

  • Signature Abuser
  • 2,265 posts
  • Location3232235883

Posted 20 November 2012 - 02:19 PM

View PostLyqyd, on 20 November 2012 - 02:00 PM, said:

Are you using http.request correctly? You know it throws an event with the results rather than returning them, correct?
yes, i know

#6 Lyqyd

    Lua Liquidator

  • Moderators
  • 8,465 posts

Posted 20 November 2012 - 02:48 PM

We'll need to see the code, then.

#7 PixelToast

  • Signature Abuser
  • 2,265 posts
  • Location3232235883

Posted 20 November 2012 - 04:08 PM

http.request("http://www.google.com/search?q=define+greece")
local response
while true do
local event,url,sourceText=os.pullEvent()
if event == "http_success" then
  response=sourceText.readAll()
  break
elseif event == "http_failure" then
  error("http_failure")
end
end
local t,a=string.find(response,'<td valign="top" style="padding-bottom:5px;padding-top:5px"><table class="ts"><tr><td>')
if not t then
print("Definition not found.")
else
local b,t=string.find(response,'</td>',a)
print(string.sub(response,a+1,b-1))
end
it errors http_failure,
but when you change line 1 to
http.request("http://www.google.com/")
it will say definition not found instead of erroring

#8 dissy

  • Members
  • 181 posts

Posted 20 November 2012 - 06:14 PM

Does anyone happen to know what HTTP_USER_AGENT is sent by the HTTP API?
If it contains Java or Lua, it is quite possible google is blocking search requests from it. They do that for a lot of scripting languages.

I've run into that problem with both Java and TCL, when using the default user agent. I had to change the user agent to match FireFox (Which unfortunately is against the Google terms of service) to get non-error result back.

#9 Espen

    Curious Explorer

  • Members
  • 708 posts

Posted 21 November 2012 - 12:39 AM

View Postdissy, on 20 November 2012 - 06:14 PM, said:

Does anyone happen to know what HTTP_USER_AGENT is sent by the HTTP API?
If it contains Java or Lua, it is quite possible google is blocking search requests from it. They do that for a lot of scripting languages.

I've run into that problem with both Java and TCL, when using the default user agent. I had to change the user agent to match FireFox (Which unfortunately is against the Google terms of service) to get non-error result back.
It sends a Java UserAgent information.
You can find out the exact string by using the HTTP API on, e.g. http://whatsmyuseragent.com/ and then looking for "Your User Agent" in the response body.

#10 PixelToast

  • Signature Abuser
  • 2,265 posts
  • Location3232235883

Posted 21 November 2012 - 05:07 AM

View PostEspen, on 21 November 2012 - 12:39 AM, said:

It sends a Java UserAgent information.
You can find out the exact string by using the HTTP API on, e.g. http://whatsmyuseragent.com/ and then looking for "Your User Agent" in the response body.
the user agent is " java/1.7.0_07 "
with a simple recode of the program i posted above:
Spoiler


#11 dissy

  • Members
  • 181 posts

Posted 21 November 2012 - 12:30 PM

Sorry it's taken so long for me to verify this, but Google search does indeed block the Java user agent.
I've confirmed this using wget and the -U option (to specify a user agent)

This is with a FireFox user agent:
$ wget -U "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0" http://www.google.com/search?q=kittens
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `search?q=kittens'

This is the exact same query made 5 seconds later with the Java agent:
$ wget -U "Java/1.7.0_07" http://www.google.com/search?q=kittens
HTTP request sent, awaiting response... 403 Forbidden
2012-11-20 18:25:14 ERROR 403: Forbidden.

Unfortunately there doesn't appear to be any way to "fix" this directly in Lua, short of using a different search engine.
This might help: http://en.wikipedia...._search_engines

#12 PixelToast

  • Signature Abuser
  • 2,265 posts
  • Location3232235883

Posted 21 November 2012 - 04:07 PM

View Postdissy, on 21 November 2012 - 12:30 PM, said:

Sorry it's taken so long for me to verify this, but Google search does indeed block the Java user agent.
I've confirmed this using wget and the -U option (to specify a user agent)

This is with a FireFox user agent:
$ wget -U "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0" http://www.google.com/search?q=kittens
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `search?q=kittens'

This is the exact same query made 5 seconds later with the Java agent:
$ wget -U "Java/1.7.0_07" http://www.google.com/search?q=kittens
HTTP request sent, awaiting response... 403 Forbidden
2012-11-20 18:25:14 ERROR 403: Forbidden.

Unfortunately there doesn't appear to be any way to "fix" this directly in Lua, short of using a different search engine.
This might help: http://en.wikipedia...._search_engines
none of those seem to work :s
many of them dont define things
some (like bing) aren't accurate

bah, lemme see if a proxy works >_>

#13 dissy

  • Members
  • 181 posts

Posted 21 November 2012 - 04:44 PM

Do you know PHP or Perl, and have a web-server you can put scripts on?
If so you could code a little proxy, where the CGI queries Google with an IE or Firefox user agent. Then your Lua program will hit your cgi/php...

The only other way I know of to do this (a huge pita) is to signup for their developer API.
With a developer API, they give you an API key, and you include that as part of the search query to a special URL.
They give you 100 queries per day for free, and then you have to pay to enable more queries ($5 per 1000 extra queries)

https://developers.g...rch/v1/overview

After all that mess, you can use a URL like this:

https:// www.googleapis.com/customsearch/v1?key={INSERT-YOUR-KEY}&cx=017576662512468239146:omuauf_lfve&q={SEARCH-TERM}

They also have a JSON interface, if you wanted to go that route.

#14 PixelToast

  • Signature Abuser
  • 2,265 posts
  • Location3232235883

Posted 21 November 2012 - 05:14 PM

nah, id rather learn php and host it on moi xampp server :3

#15 Espen

    Curious Explorer

  • Members
  • 708 posts

Posted 21 November 2012 - 06:20 PM

I've just posted a suggestion about including Java-side setRequestProperties on the connection.
This would not only would solve the User-Agent problem, but also allow one to set any header values, which enables a lot of other functions like e.g. sending cookies, etc.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users