Wednesday, May 16, 2007

Oniguruma and Named Regexes in Ruby 1.8

Unlike Python and C# (and probably others) Ruby 1.8 does not support named groups. This is available in 1.9 though. Or so I hear.

What are named groups? Basically they allow you to reference the results of the match option with the nice names you embedded in the regex object. Since Blogger hates anything with <>'s you can see an example here . Look near synopsis.

So how do you get this in 1.8.6?

Windows users have it easy, just install the win32 GEM, but OSX (and I assume other UNIX) require a bit of extra steps and it is sort of confusing because there are lots of versions of the Oniguruma C Library (do I really want to download stuff hosted on geocities Japan?) to choose from and there is a way to install it which requires recompiling your Ruby but that only works for 1.8.4 so I didn't bother.

So here is what I did to get it working on OSX (PPC) with 1.8.6

1. Download 4.6.2 of the C Library. Configure with whatever prefix you are using for Ruby (I use /my) and make install

2. Download version 1.10 of the GEM Source Tarball and do the standard ruby extconf.rb dance within ext. However

Change INFLAGS in the Makefile so it can find the oniguruma.h

Mine looks like this because ruby is installed in /my

INCFLAGS = -I. -I. -I/my/include -I/my/lib/ruby/1.8/powerpc-darwin8.9.0 -I.

If you don't do this you will get


gcc -I/my -I. -I. -I/my/lib/ruby/1.8/powerpc-darwin8.9.0 -I. -fno-common -Wall -c oregexp.c
oregexp.c:2:23: error: oniguruma.h: No such file or directory
oregexp.c:9: error: parse error before 'regex_t'
oregexp.c:9: warning: no semicolon at end of struct or union
oregexp.c:10: warning: type defaults to 'int' in declaration of 'ORegexp'
oregexp.c:10: warning: data definition has no type or storage class
oregexp.c:15: error: parse error before '*' token
oregexp.c: In function 'oregexp_free':
oregexp.c:16: warning: implicit declaration of function 'onig_free'
oregexp.c:16: error: 'oregexp' undeclared (first use in this function)
oregexp.c:16: error: (Each undeclared identifier is reported only once
oregexp.c:16: error: for each function it appears in.)
oregexp.c: In function 'oregexp_allocate':
oregexp.c:21: error: 'oregexp' undeclared (first use in this function)
oregexp.c: At top level:
oregexp.c:27: error: parse error before '*' token
oregexp.c:27: warning: return type defaults to 'int'
oregexp.c: In function 'int2encoding':


If you compile it successfully you'll see the library (oregexp.bundle, whatever that is) put in /my/lib/ruby/site_ruby/1.8/powerpc-darwin8.9.0 (or whatever you path is)

However require 'oniguruma.rb' will still fail until you do this:

cp oniguruma.rb /my/lib/ruby/site_ruby/1.8/


But it if you did get it installed you can run the test suite


franz-g4:/tmp root# ruby test_oniguruma.rb
4.6.2
Loaded suite test_oniguruma
Started
..............................................
Finished in 0.046221 seconds.

46 tests, 105 assertions, 0 failures, 0 errors


But of course the example from the README doesn't work (either on OSX or Windows) in typical Ruby fashion.

Simple, easy, and fun!

3 comments:

dichodaemon said...

Hi Matt,

I am the administrator of oniguruma bindings. It's great that you have posted this! since I do not have access to a Mac, I have been unable to correct those problems. For now, I am adding a link to your instructions on the web page.

Dizan

J.J. said...

the gem stinks.
Good gems tell you if your platform is unsupported by the gem, or give you an option to choose the platform.

Good installation is as important as something working after installation.

Wolf said...

Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.

http://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html

Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.