Sunday, July 8, 2012

Regex: Identifying paths

Hi again and thank you for visiting my blog.

It's already three months I'm working on a project which again, as many other projects I'm working/worked on, started from just an idea.
So one of the challenges I met in this project was to come up with a regular expression, which will identify whether the given part of the string is a path, part of it, or relative one.
Remember (or if you don't know try) the TAB support in windows command window - you can type in a relative or full path, then type some prefix and on each sequentiall hit of TAB button, the file/folder names, which are in the specified path and start with the typed prefix, will be placed like auto-completition feature one after another.
So I finally came up with the following regular expressoin:
"((((^\w:|\s\w:)|\.\.?)(\/|\\))?(([\w.]+( [\w.]+)?)(\/|\\))*)|((((^\w:|\s\w:)|\.\.?)(\/|\\))?(([\w.]+([\w.]+)?)(\/|\\))*))([\w\.]*)\t"
Because it's quite complex, the processing was really taking long. First I tried to compile that regular expression into a separate assembly and then use the compiled one, but unfortunately it affected the performance harder, than earlier. Eventually I understood that I don't need all the groups defined in the pattern, so I removed those (marked those not to be counted as groups using (?:...)), and it increased the performance really good. So here what I ended up with:

"(?:(?:(?:(^\w:|\s\w:)|\.\.?)(?:\/|\\))?(?:(?:[\w.]+(?: [\w.]+)?)(?:\/|\\))*)|(?:(?:(?:(^\w:|\s\w:)|\.\.?)(?:\/|\\))?(?:(?:[\w.]+(?:[\w.]+)?)(?:\/|\\))*))([\w\.]*)\t"Althought the compiled query execution time need to be shorter than the non compiled one, I haven't went into details why it was like that, but the fact is fact - that's true. I'm pretty sure that for shorter regular expressions that will work faster, but no in this case.

Hope the regex will be helpful.

Enjoy .Net Development !!!

No comments: