Jump to content




Checking for an escaped quote


  • You cannot reply to this topic
15 replies to this topic

#1 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 21 September 2014 - 10:24 PM

Hello folks,

Im stuck with my parser I am writing. It basically is a recursive descent parser, which technically parses it letter by letter. That is all going nice for me, but I got a problem. The part Im stuck on with the parser, is parsing a string like this:
blabla bla "I need this part \" and this part" blablal
I know when I need to get parsing the string, when it hits the first double quote. Then we need get to the second quote because a string is enclosed in those. In this particular case it simply will just stop parsing when it hits the second double quote, because there is no real difference between a quote and escaped quote.

To clearify for my rather vague description, I would need this output (when I print it to the console:
I need this part " and this part

I really consider patterns as a last option because that will screw up the tokenizer and would have me a lot of work to do. So how would I detect an escaped cape versus one which isn't?

#2 theoriginalbit

    Semi-Professional ComputerCrafter

  • Moderators
  • 7,332 posts
  • LocationAustralia

Posted 22 September 2014 - 02:54 AM

What have you got so far? I assume it is thinking that the escaped quote is the end of the string?

#3 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 22 September 2014 - 07:04 AM

View Posttheoriginalbit, on 22 September 2014 - 02:54 AM, said:

What have you got so far? I assume it is thinking that the escaped quote is the end of the string?
I technically have nothing yet on this part, because I already had done some tests and everything stopped at the quoted string. So officially I have nothing but it would work how you said it. It's not that hard to do with a tokenizer, it simply stops at any escaped quote (however it will find a difference between " and ', but that's obvious)

#4 theoriginalbit

    Semi-Professional ComputerCrafter

  • Moderators
  • 7,332 posts
  • LocationAustralia

Posted 22 September 2014 - 07:14 AM

well if you are doing it letter by letter, you could simply have a boolean flag for when you discover a \ and then just ignore the next character, or alternatively, validate the next character is a valid character, since this is also a valid string
'hello \" world'
you don't have to only have them terminated within a double quoted string.

#5 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 22 September 2014 - 07:27 AM

I'd go with something along these lines:

local myData = "blabla bla \"I need this part \\\" and this part\" blablal"

local foundInQuotes, inQuotes, i = "", false, 1

while i <= #myData do
	if myData:sub(i,i) == "\\" then
		i = i + 1
		
		if inQuotes then
			if myData:sub(i,i) == "n" then
				foundInQuotes = foundInQuotes .. "\n"
			else foundInQuotes = foundInQuotes .. myData:sub(i,i) end
		end
		
	elseif myData:sub(i,i) == "\"" then
		inQuotes = not inQuotes

	elseif inQuotes then
		foundInQuotes = foundInQuotes .. myData:sub(i,i)
	end
	
	i = i + 1
end

print(foundInQuotes)

Though this doesn't deal with the single-quote matter BIT points out, I'm sure you can see how to modify it if you want that functionality.

Edited by Bomb Bloke, 22 September 2014 - 07:29 AM.


#6 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 22 September 2014 - 07:42 AM

View Posttheoriginalbit, on 22 September 2014 - 07:14 AM, said:

well if you are doing it letter by letter, you could simply have a boolean flag for when you discover a \ and then just ignore the next character, or alternatively, validate the next character is a valid character, since this is also a valid string
'hello \" world'
you don't have to only have them terminated within a double quoted string.
Thing is, \" is only one character long. So I cannot detect the backslash. I will get back to this topic once I get to a computer and can test more again.

Both of you, bomb bloke and bit, thanks for your help already

#7 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 22 September 2014 - 01:03 PM

I have got the following code:
Spoiler
However, this is not appropriate for a parser, because when a quote is missing it will error because of a TLWY. That is one point I will have to fix, but this is mostly for testing circumstances.

I did the following test with this function in mind:
local t = {}
t.reader = StringReader.new( '"bla bla bla \" blabla"' )
print(readString(t))
This does what I thought initially, it prints the following to the console (without quotes):
"bla bla bla "

I hope you guys have more ideas or have improvements on it, but remember that this is mostly a test funtion. Also some information on the StringReader object:
Spoiler

Thanks in advance and for the effort

#8 MKlegoman357

  • Members
  • 1,170 posts
  • LocationKaunas, Lithuania

Posted 22 September 2014 - 02:48 PM

Just an idea, but I think it's not completely your fault. The way you define this string:

'"bla bla bla \" blabla"'

Well, the Lua parser converts it to this, if I'm not mistaken:

'"bla bla bla " blabla"'

That is probably why your other attempts failed when they tried to find the backslash. To produce a string with a backslashed double quote, inside double quotes you should use this:

"bla bla \\\" bla"

...and then the Lua parser will convert it to this:

"bla bla \" bla"


#9 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 22 September 2014 - 03:48 PM

View PostMKlegoman357, on 22 September 2014 - 02:48 PM, said:

Just an idea, but I think it's not completely your fault. The way you define this string:

'"bla bla bla \" blabla"'

Well, the Lua parser converts it to this, if I'm not mistaken:

'"bla bla bla " blabla"'

That is probably why your other attempts failed when they tried to find the backslash. To produce a string with a backslashed double quote, inside double quotes you should use this:

"bla bla \\\" bla"

...and then the Lua parser will convert it to this:

"bla bla \" bla"
Thats obvious. Becuase \\ escapes the backslash so there is essentially a backslash. But what Im trying to do is to detect an escaped string, which is: "\""
For the record, I dont define the string myself, it is actually a JSON parser. Of course you could think why would someone put "djsfnhk\" ldejfnhkdf" as key, but I want to it be perfect; it should function as long as its a valid string.

Thanks for pointing it out though, any effort is appreciated! :)

#10 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 22 September 2014 - 05:50 PM

I have found a possible lead to a solution to this. If I'd use patterns for this, it could possibly work. Because the following works like a charm:

local s = "\"This is \"a string\" with escaped\" quotes\""
print(s:gsub( "\"(.*)\"", "%1"))
--> This is "a string" with escaped" quotes
However, this will completely kill the use of my tokenizer, although I can work around it and make it work properly. My question now is, given this information, can I make use of y tokenizer properly and still detect escaped quotes?

#11 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 23 September 2014 - 02:27 AM

One problem with that string is that it doesn't contain escaped quotes. You used escape characters to put the quotes in there, but the escape characters themselves aren't inserted.

I'm confused as to where you're getting your strings from, and the exact format they should be in. Let's say you were reading them from a text file - what, verbatim, would you put in that file?

#12 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 23 September 2014 - 02:00 PM

I just realized something which makes this unnecessary to even try. If one escapes a quote it is for the compiler/parser, not the string itself.

Im a fool sometimes xD

#13 MKlegoman357

  • Members
  • 1,170 posts
  • LocationKaunas, Lithuania

Posted 23 September 2014 - 04:40 PM

Lol, that was exactly what I mentioned to you in my first post :D

#14 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 23 September 2014 - 08:30 PM

View PostMKlegoman357, on 23 September 2014 - 04:40 PM, said:

Lol, that was exactly what I mentioned to you in my first post :D
Im assuming I read it too quickly or something but I completely missed it. I should think and read more thoroughly next time.

Thank you Bomb Bloke, MKlegoma357 and theoriginalbit for the help you guys provided.
It really, really is appreciated!

#15 theoriginalbit

    Semi-Professional ComputerCrafter

  • Moderators
  • 7,332 posts
  • LocationAustralia

Posted 23 September 2014 - 11:36 PM

View PostEngineer, on 23 September 2014 - 02:00 PM, said:

I just realized something which makes this unnecessary to even try. If one escapes a quote it is for the compiler/parser, not the string itself.
I wish I'd've come back and read this thread sooner. I assumed you already knew this, hence my suggestion of the boolean flag.

#16 Engineer

  • Members
  • 1,378 posts
  • LocationThe Netherlands

Posted 24 September 2014 - 07:20 PM

View Posttheoriginalbit, on 23 September 2014 - 11:36 PM, said:

I wish I'd've come back and read this thread sooner. I assumed you already knew this, hence my suggestion of the boolean flag.
I should have known that, but Ijust didnt realize that. Its worth saying I literally did this:
Spoiler






3 user(s) are reading this topic

0 members, 3 guests, 0 anonymous users