Jump to content




Binary data with HTTP Api polluted?


2 replies to this topic

#1 James0x57

  • Members
  • 8 posts

Posted 21 May 2013 - 02:16 AM

I know it returns a handle as if it was a file opened in readonly text mode, but it's still just bytes. It seems that sometimes an extra byte is thrown in though... It's reproducable but I haven't seen a pattern/reason why.
I have searched several times for an answer since January but have not been able to find one.

Given this (the actual thing I'm working on), save it as 'getNBS' for an example:
local args = {...};
if #args == 2
	and string.find(args[1], ".nbs", -4, true) ~= nil
	and string.find(args[2], ".nbs", -4, true) ~= nil then
  local url = args[1];
  local fn = args[2];
  local fh = fs.open('songs/'..fn, 'wb');
  local dt = http.get(url).readAll();
  local ret = "";
  if dt ~= nil and #dt > 0 then
   local x = 1;
   while x < #dt do
	local byt = string.byte(dt, x);
	ret = ret .. string.format("0%X ", byt):sub(-3);
	if #ret == 48 then
	 print(ret);
	 ret = "";
	end
	fh.write(byt);
	x = x+1;
   end  
  end
  print(ret);
  fh.close();
end

Then this call should download a binary file from a server and save it into a new file. It will also print the hex to the screen.
getNBS http://geckocodes.org/James0x57/zeldaSecret.nbs test.nbs

If you use your browser to download the same file, you can open it in a hex editor to compare. At offset 0x27, the byte is 0xE8 but when you read it from HTTP Api, that byte becomes halfword 0xC3A8.
The rest of the data is intact. Since the file doesn't end in 0x0A or 0x0D (EOL characters), when you do readAll(), it should be an exact duplicate of the data. (...right?)
Interestingly, the same byte is affected in the same way when you read the handle returned from http.get() by doing repeated .readLine()'s on it.

It might be an issue with sting.byte() potentially too...

Any thoughts or insight into this?

End goal of this topic is to determine if this behavior is predictable so I can address and correct it on the fly OR, if it's actually a bug, to shine some light on it.

Thank you for your time!

(and thank you very much for ComputerCraft!)





EDIT:
Whoa, just figured it out. No idea why it came to me but I thought of an accented e right after I posted it, then checked:

UTF-8 CodePoint:
è = 00E8

UTF-8 Hex:
è = C3A8

So that's the issue, it's converting codepoints to hex. Odd though because it's automatically assuming the 00 on the left when it reads the 0xE8 byte by itslef to complete the 16bit codepoint, before sting.byte() outputs the UTF-8 Hex value instead.

I'll dig into a solution after work tomorrow. - would still like a thread, if possible, to share whatever I find. Hopefully it'll help someone else out

Edited by Lyqyd, 21 May 2013 - 01:54 PM.


#2 Lyqyd

    Lua Liquidator

  • Administrators
  • 6,080 posts

Posted 21 May 2013 - 01:53 PM

Split into new topic.

Interestingly, UTF issues were the first thing that came to mind as I was reading your description of the problem. I'm not surprised that that turned out to be the case.

#3 James0x57

  • Members
  • 8 posts

Posted 24 May 2013 - 03:53 AM

This turned out to be a much bigger pita than I was hoping but.. I made a program that can download binary files from HTTP in ComputerCraft and converts them for you as automatically as possible.
97% of possible bytes will convert automatically.

It is kind of annoying to use this when it's bad... but it's the only thing we have for now and there's not really a way around it that doesn't involve a server-side service.

Why
Since the binary file has to be gotten in text mode when we use HTTP API, there are several things that we need to fix and check manually when converting it back to binary.
This program will walk you through them if they are encountered. You can get lucky and hit several binary files that convert with no problem.
But... If your source file has a bunch of 0x0D bytes in it, you're going to hate this program... even though you'll eventually have an identical copy of the file on your favorite HTTP API Enabled server. <_<

NOTE: You'll want a Hex Editor on your desktop (like HxD) and to have the file you're downloading into ComputerCraft open in it.


Here's the source:
http://pastebin.com/U0RSyAj0

You can download the program onto your computercraft computers using this command:
pastebin get U0RSyAj0 bHTTP

Then to download a binary file, you type:
bHTTP < url of binary file > < destination file name >

Like so:
bHTTP http://geckocodes.org/James0x57/zeldaSecret.nbs songs/test.nbs

You will have to use the hex editor to verify/correct each instance of the following through the magic of boring prompts:
1) Any time the current byte is 0x0A because ALL 0x0D source bytes are converted into 0x0A.
1.2) If the current byte is 0x0A, then it may be that the source had 0x0D0A (\r\n). Those are also replaced with just 0x0A. The prompts correct it.

2) Any time the byte in the source file is one of these 5 bytes:
0x81,0x8D,0x8F,0x90,0x9D
because all of these bytes in a source, when read through HTTP API, are completely identical in the output. So it is impossible to know/guess what the source was.

3) Any time the file ends in 0x0A or 0x0D, at least one byte will be stripped from the end of your file.
The prompts will give you the info about what it has finished with and what the offset is, so you can verify and correct if needed.
Since most files will not end in either of these characters, if you pass in 0 as the 3rd parameter of bHTTP, it will skip this prompt automatically.

The rest is converted automatically. Hopefully I haven't overlooked anything huge. It's working well for me though!

Hope it helps!





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users