Jump to content




[RFC] Metadata file structure compatible with all file types


  • You cannot reply to this topic
52 replies to this topic

#1 Lyqyd

    Lua Liquidator

  • Moderators
  • 8,465 posts

Posted 13 May 2014 - 06:02 PM

This RFC is proposed as an alternative system to the metadata RFC recently proposed by AmandaC.

The alternative system I propose is as follows:

A folder named ".meta" will be present in the root of the drive. Any program may create this folder if it is not present, but no program may delete this folder without explicit instruction to do so by the user (for instance, the user running "rm .meta").

Inside this folder, the folder structure present in the rest of the computer will be mirrored for any metadata present. A file named bar in the folder foo (/foo/bar) would have metadata present at "/.meta/foo/bar". The metadata file will be a valid Lua file declaring a number of values. No values are required to be present, but some values must be in the specified format if they are present. Each value is listed in the table below; an example metadata file follows:

Key:          Value:
name          string
author        string
version       string
description   string
icon          table
timestamp     table
type          string

The "icon" table would be formatted with each key of the table being the width and height of the icon image as a string, concatenated from width.."x"..height. The value of each would be a file location. The metadata of those files would be used for file type detection, if applicable.

The timestamp table's keys would be any of the following values: created, modified, accessed. The value would be a standard time stamp format to be determined, but likely including in-game date and time in a string.

An example metadata file:

name = "Awesome Image"
author = "oeed"
type = "sketch"
timestamp = {
  accessed = "2, 12:45",
  modified = "1, 9:32",
  created = "1, 9:28",
}

Timestamp values are using an arbitrary format and do not constitute a formal proposal.

Here is an example function to parse a metadata file, given an exact path to the file. As you can see, storing the metadata as both a human-readable file and a valid Lua file has distinct advantages. Note that to maintain human-readability, textutils.serialize is not an option, due to its behavior on pre-1.6 versions of ComputerCraft. This function returns a table, where each key is one of the metadata attributes (name, version, description, etc.) and the values are the values declared in the metadata file:

function parseMeta(metaPath)
  if fs.exists(metaPath) then
    local handle = fs.open(metaPath, "r")
    if handle then
      local data = {}
      local fn, err = loadstring(handle.readAll())
      handle.close()
      if fn then
        setfenv(fn, data)
        local success, err = pcall(fn)
        if success then
          return data
        else
          return nil, err
        end
      else
        return nil, err
      end
    else
      return nil, "could not read metadata"
    end
  else
    return nil, "no metadata found"
  end
end

Obviously, getting the metadata for a file using the file's path is a simple matter:

function getMetadataForFile(path)
  return parseMeta(fs.combine(".meta", path))
end

To build a comprehensive table of metadata for all files on the computer, we simply need another pair of functions. The below functions implement a recursive solution, using a worker function to get all of the metadata for each file in a folder, and another function to correctly initialize the first function.

local function getMetadataForFolder(folderPath, metadata)
  if not metadata then metadata = {} end
  for _, file in ipairs(fs.list(folderPath)) do
    local path = fs.combine(folderPath, file)
    if fs.isDir(path) then
      getMetadataForFolder(path, metadata)
    else
      metadata[path] = parseMeta(path)
    end
  end
  return metadata
end

function getAllMetadata()
  if fs.exists(".meta") and fs.isDir(".meta") then
    return getMetadataForFolder(".meta")
  else
    return nil, "no metadata available"
  end
end

As you can see, the above two functions only iterate the files in the .meta folder to avoid extra file existence checks. If desired, the parseMeta function could be switched out for the getMetadataForFile function and an exception made to ignore the .meta folder when looking for files to check metadata on.

Packed metadata (for file transmission purposes) can be handled by packing the metadata in whatever format is desired and providing an unpacking function at the top of the file. If present, an unpacking function will be preceded by the comment "[email protected] unpack" and followed by the comment "[email protected] unpack". This function will be loaded into an environment including all built-in APIs and the current shell, and executed with one parameter: the full path of the file. This will allow the file to write its own metadata file and self-modify to remove the unpacking function if desired.

#2 zekesonxx

  • Signature Abuser
  • 263 posts
  • LocationWhere you aren't

Posted 13 May 2014 - 06:23 PM

You really thought this through didn't you?
I would think it would make the most sense to have the timestamps be UTC Unix timestamps, although I'm not sure if you can get those in CC's APIs without an external source (ie a web server)

#3 Lyqyd

    Lua Liquidator

  • Moderators
  • 8,465 posts

Posted 13 May 2014 - 06:47 PM

A utc-style timestamp would be useful, but any genuine utc timestamps would require an external data source. Maybe there is an easy way to construct and decode a similar sort of numeric-only timestamp based on the minecraft world date and time.

#4 apemanzilla

  • Members
  • 1,421 posts

Posted 13 May 2014 - 06:50 PM

Sorry, but wouldn't it be better to save icons as nested tables? It would allow you to iterate through the individual columns or rows more easily...

#5 Lyqyd

    Lua Liquidator

  • Moderators
  • 8,465 posts

Posted 13 May 2014 - 07:03 PM

That's certainly a viable alternative. Both approaches have advantages and disadvantages. I do like the flat table approach, but two dimensional tables would be nice too. I don't think that iterating individual columns and rows would be done all that commonly, though. It seems more likely that one would want to look up a specific dimension set. The two-dimensional table does mean that that sort of lookup requires extra steps (you have to make sure the width table exists before indexing the width and length), so there are advantages both ways. I feel like looking up specific dimensions would be a more common task than iterating the entire table, so I suggested the flat table way.

#6 apemanzilla

  • Members
  • 1,421 posts

Posted 13 May 2014 - 07:15 PM

View PostLyqyd, on 13 May 2014 - 07:03 PM, said:

That's certainly a viable alternative. Both approaches have advantages and disadvantages. I do like the flat table approach, but two dimensional tables would be nice too. I don't think that iterating individual columns and rows would be done all that commonly, though. It seems more likely that one would want to look up a specific dimension set. The two-dimensional table does mean that that sort of lookup requires extra steps (you have to make sure the width table exists before indexing the width and length), so there are advantages both ways. I feel like looking up specific dimensions would be a more common task than iterating the entire table, so I suggested the flat table way.
True. Also, how do you intend to get the height and width of the icons in flat form? Iterate over each item finding the maximum? With a 2D array it would be as simple as
local w,h = #icon, #icon[1]

Maybe it would be best to support both methods. I'd be willing to write a couple intermediate methods to unify it all into methods like icon:getPixel(1,2) or icon:getWidth().

#7 Lyqyd

    Lua Liquidator

  • Moderators
  • 8,465 posts

Posted 13 May 2014 - 07:35 PM

Hmm. Supporting both formats adds significant complexity for trivial benefits. I'm not a fan of the using-both idea. The 2D array wouldn't work with the # shortcut, since it would be a sparse matrix and those shortcuts require a contiguous array-style table. One or the other of the two ways of doing it will have to be picked. They're both going to require iterating the table with a pairs() loop to look at every entry, of course. The two-dimensional method requires a (second) nested pairs loop as well, whereas the flat table way requires a string.match to pull dimensions out of the key. I'm fine with either way, but the flat table method definitely seems easier to me. Further thoughts? Anybody else have an opinion on flat vs. 2D for the icons?

#8 apemanzilla

  • Members
  • 1,421 posts

Posted 15 May 2014 - 01:39 AM

I'll let you choose since it's your idea :P

Additionally, how will file transfers be handled? (copying to a disk and then to another computer, uploading to pastebin, etc) With a disk you could just stick a .meta folder on the disk and hope the receiving end knows how to use it, but with pastebin it's not so easy...

#9 oeed

    Oversimplifier

  • Members
  • 2,095 posts
  • LocationAuckland, New Zealand

Posted 15 May 2014 - 02:38 AM

I quite like this idea, I might try to implement it in OneOS 1.2.


Any possibility of an API to do this, or is it something we need to make ourselves?

#10 theoriginalbit

    Semi-Professional ComputerCrafter

  • Moderators
  • 7,332 posts
  • LocationAustralia

Posted 15 May 2014 - 02:51 AM

wait, what is `type` is it `program` vs `api`, or?

#11 oeed

    Oversimplifier

  • Members
  • 2,095 posts
  • LocationAuckland, New Zealand

Posted 15 May 2014 - 03:05 AM

View Posttheoriginalbit, on 15 May 2014 - 02:51 AM, said:

wait, what is `type` is it `program` vs `api`, or?

Based on the example I'd say it's the file type. So for example a .xls file would have the type 'excelspreadsheet' (or something like that).

I'd also say we need a table of these types, as some people may use other names.

So maybe something like, 'plaintext' for .txt and .text documents, and 'luacode' for .lua files.

Or something like that.


On the subject of timestamps, I feel that a table would be better (so '13:45, Day 1' would be {13, 45, 1}, maybe not that order though. Otherwise you'll have to parse the string. In the case of OneOS, I don't want it to be too slow when browsing files.

Also, how to you propose writing changes?

#12 apemanzilla

  • Members
  • 1,421 posts

Posted 15 May 2014 - 03:36 AM

View Postoeed, on 15 May 2014 - 03:05 AM, said:

View Posttheoriginalbit, on 15 May 2014 - 02:51 AM, said:

wait, what is `type` is it `program` vs `api`, or?

Based on the example I'd say it's the file type. So for example a .xls file would have the type 'excelspreadsheet' (or something like that).

I'd also say we need a table of these types, as some people may use other names.

So maybe something like, 'plaintext' for .txt and .text documents, and 'luacode' for .lua files.

Or something like that.


On the subject of timestamps, I feel that a table would be better (so '13:45, Day 1' would be {13, 45, 1}, maybe not that order though. Otherwise you'll have to parse the string. In the case of OneOS, I don't want it to be too slow when browsing files.

Also, how to you propose writing changes?

Maybe for filetypes, we can have a GitHub repo that has a file detailing each extension and their names:
{
  ["xls"] = {
    "excelspreadsheet",
    "ExcelSpreadsheet",
    "spreadsheet"
  },
  ["nfp"] = {
    "image",
    "paintimage"
  },
  ["txt"] = {
    "plaintext",
    "text"
  },
  ["lua"] = {
    "luacode",
    "code",
    "program"
  }
It would allow users to submit a PR or issue if they want a file type added for their files, and OSes could update the file from HTTP on boot or something along those lines.

If we wanted, we could even add default icons for each extension to the table structure if the OS doesn't know what to use.

For timestamps:

Generating:
local timestamp = 24 * os.day() + os.time()

Parsing for day:
local day = math.floor(timestamp / 24)

Parsing for time:
local time = timestamp % 24

This would be fast, efficient and work nearly everywhere I can think of. You obviously wouldn't want to send timestamp data across worlds, but this is the easiest non-HTTP system.

Edited by apemanzilla, 15 May 2014 - 03:44 AM.


#13 oeed

    Oversimplifier

  • Members
  • 2,095 posts
  • LocationAuckland, New Zealand

Posted 15 May 2014 - 04:27 AM

View Postapemanzilla, on 15 May 2014 - 03:36 AM, said:

Maybe for filetypes, we can have a GitHub repo that has a file detailing each extension and their names:
{
  ["xls"] = {
	"excelspreadsheet",
	"ExcelSpreadsheet",
	"spreadsheet"
  },
  ["nfp"] = {
	"image",
	"paintimage"
  },
  ["txt"] = {
	"plaintext",
	"text"
  },
  ["lua"] = {
	"luacode",
	"code",
	"program"
  }
It would allow users to submit a PR or issue if they want a file type added for their files, and OSes could update the file from HTTP on boot or something along those lines.

If we wanted, we could even add default icons for each extension to the table structure if the OS doesn't know what to use.

For timestamps:

Generating:
local timestamp = 24 * os.day() + os.time()

Parsing for day:
local day = math.floor(timestamp / 24)

Parsing for time:
local time = timestamp % 24

This would be fast, efficient and work nearly everywhere I can think of. You obviously wouldn't want to send timestamp data across worlds, but this is the easiest non-HTTP system.

Yea, I like the GitHub idea. If I end up adding this to OneOS before someone else creates an API I'll make something like that.

Putting the time in a single number is a good idea too.

In OneOS in terms of showing time modified, I'll probably convert it to a real timeago. (So, for example, if it was made 1 hour ago of in game time I'll show 'A minute ago' (or what ever the actual time is))

#14 apemanzilla

  • Members
  • 1,421 posts

Posted 15 May 2014 - 05:07 AM

View Postoeed, on 15 May 2014 - 04:27 AM, said:

View Postapemanzilla, on 15 May 2014 - 03:36 AM, said:

-- snip --

Yea, I like the GitHub idea. If I end up adding this to OneOS before someone else creates an API I'll make something like that.

Putting the time in a single number is a good idea too.

In OneOS in terms of showing time modified, I'll probably convert it to a real timeago. (So, for example, if it was made 1 hour ago of in game time I'll show 'A minute ago' (or what ever the actual time is))
Hmm.. Would be possible, but likely inaccurate. You have to remember time isn't measured in seconds, but ticks - TPS drops, then the readings are off. If you want to do that, you'd probably be best off using real-world time.

I don't think we're quite ready for an API yet - we still have to figure out things like saving icons, icon image formats, keys and what they mean, etc.

Edited by apemanzilla, 15 May 2014 - 05:16 AM.


#15 oeed

    Oversimplifier

  • Members
  • 2,095 posts
  • LocationAuckland, New Zealand

Posted 15 May 2014 - 06:24 AM

View Postapemanzilla, on 15 May 2014 - 05:07 AM, said:

Hmm.. Would be possible, but likely inaccurate. You have to remember time isn't measured in seconds, but ticks - TPS drops, then the readings are off. If you want to do that, you'd probably be best off using real-world time.

I don't think we're quite ready for an API yet - we still have to figure out things like saving icons, icon image formats, keys and what they mean, etc.

I suppose you're right about the TPS issue, although it's better to be off by a bit than not be able to know when it was made at all. A real world time solution would be a lot better if it weren't for the issue of playing without internet.

Oh I know it's not ready for an API yet, I'm more wondering if someone else is considering to make one.

#16 apemanzilla

  • Members
  • 1,421 posts

Posted 15 May 2014 - 12:54 PM

View Postoeed, on 15 May 2014 - 06:24 AM, said:

View Postapemanzilla, on 15 May 2014 - 05:07 AM, said:

-- snip --

I suppose you're right about the TPS issue, although it's better to be off by a bit than not be able to know when it was made at all. A real world time solution would be a lot better if it weren't for the issue of playing without internet.

Oh I know it's not ready for an API yet, I'm more wondering if someone else is considering to make one.
The thing with using the in-game time is that usually servers tend to slow down to 10-15 tps over time, which means that your readings would quickly end up very far off.

On the topic of an API, I'd be happy to help write one for public use.

#17 Lyqyd

    Lua Liquidator

  • Moderators
  • 8,465 posts

Posted 15 May 2014 - 02:40 PM

View Postapemanzilla, on 15 May 2014 - 01:39 AM, said:

I'll let you choose since it's your idea :P

Additionally, how will file transfers be handled? (copying to a disk and then to another computer, uploading to pastebin, etc) With a disk you could just stick a .meta folder on the disk and hope the receiving end knows how to use it, but with pastebin it's not so easy...

Well, if it comes down to it, let's stick with the flat table format for icons for now.

View Postoeed, on 15 May 2014 - 02:38 AM, said:

I quite like this idea, I might try to implement it in OneOS 1.2.


Any possibility of an API to do this, or is it something we need to make ourselves?

I'll probably put an API repo up on github for this, which everyone is welcome to contribute to. It will be relatively generic, mostly dealing with storing and retrieving the metadata. I will probably also work on a very simple fs override that implements automatic updating of the metadata using that API.

View Posttheoriginalbit, on 15 May 2014 - 02:51 AM, said:

wait, what is `type` is it `program` vs `api`, or?

Type is intended to be a short, one-word string that unambiguously describes the format of the file contents; formats such as "text", "lua", "nfp", "nfa", "paint", etc.

View Postoeed, on 15 May 2014 - 03:05 AM, said:

Also, how to you propose writing changes?

I'm not sure what you're asking here. Could you clarify the question?

View Postapemanzilla, on 15 May 2014 - 03:36 AM, said:

For timestamps:

Generating:
local timestamp = 24 * os.day() + os.time()

Parsing for day:
local day = math.floor(timestamp / 24)

Parsing for time:
local time = timestamp % 24

This would be fast, efficient and work nearly everywhere I can think of. You obviously wouldn't want to send timestamp data across worlds, but this is the easiest non-HTTP system.

That's a pretty reasonable timestamp format, nice and compact.

#18 CometWolf

  • Members
  • 1,283 posts

Posted 15 May 2014 - 02:51 PM

View PostLyqyd, on 13 May 2014 - 07:03 PM, said:

The two-dimensional table does mean that that sort of lookup requires extra steps (you have to make sure the width table exists before indexing the width and length), so there are advantages both ways.
There's a quick way around this which i use for Turtle Architect, metatables! Seeing as the information is retrieved by executing the file as a code anyways, it wouldn't be that hard to implement.
icon = setmetatable(
  {
    [1] = {1,2,3} -- you get the point
  },
  {
    __index = function(t,k)
	  t[k] = {}
	  return t[k]
    end
  }
)
As you might've gussed, i'd prefer 2D tables. Makes it easier to process in my opinion if the values are just numbers. Idk what would be lightest computation wise, but maybe we should find out?

#19 Tiin57

    Java Lunatic

  • Members
  • 1,412 posts
  • LocationIndiana, United States

Posted 15 May 2014 - 09:12 PM

Why not use HTTP mime-type things? (Can't remember what they're called, but those "text/html" and "application/lua" strings in HTTP headers.)

#20 oeed

    Oversimplifier

  • Members
  • 2,095 posts
  • LocationAuckland, New Zealand

Posted 15 May 2014 - 09:43 PM

View PostLyqyd, on 15 May 2014 - 02:40 PM, said:

View Postoeed, on 15 May 2014 - 03:05 AM, said:

Also, how to you propose writing changes?

I'm not sure what you're asking here. Could you clarify the question?

Well, when, for example, someone modifies the document and you need to change the modified date. It's just not the easiest format to allow editing really.

View PostTiin57, on 15 May 2014 - 09:12 PM, said:

Why not use HTTP mime-type things? (Can't remember what they're called, but those "text/html" and "application/lua" strings in HTTP headers.)
That'd actually make more sense, so you can do things like image/nfp etc.


Also, I was thinking that it'd probably best best if part of the API was an fs API wrapper. So essentially every time you used things such as fs.open it wouild change/create the meta data file.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users