inject, inject, inject

If inject had a better name, then it would be easier for us rookies to remember that it’s the right way to accumulate values across an array:

1
irb(main):001:0> (1..10).inject(0) { |s,v| s += v }
=> 55

Might I suggest something along the line of:

  • accumulate
  • assemble
  • gather (my favorite)

Ruby Variable Scope and Blocks

The blocks used by iterators (such as loop and each) are a little different. Normally, the local variables created in these blocks are not accessible outside the block. [ 1, 2, 3 ].each do |x| y = x + 1 end [ x, y ]

produces: prog.rb:4: undefined local variable or method `x’ for # (NameError) However, if at the time the block executes a local variable [that] already exists with the same name as that of a variable in the block, the existing local variable will be used in the block. Its value will therefore be available after the block finishes.

My first impression was that it was wierd for blocks to have access to variables outside the block. But after working with the language, I realized that was perfectly natural and consistent with blocks becoming closures. And that the wierd thing was that local variables created in the block were not accessible outside the block. For example, it’s easy to find the sum or product of an array of numbers when the block has access to variables that have been previously defined:

1
irb(main):001:0> sum = 0
=> 0
irb(main):002:0> (1..10).each { |i| sum += i }
=> 1..10
irb(main):003:0> sum
=> 55

And it’s wierd that this:

1
irb(main):001:0> if false then last = nil end
=> nil
irb(main):002:0> (1..10).each { |i| last = i }
=> 1..10
irb(main):003:0> last
=> 10

behaves differently than this:

1
irb(main):001:0> (1..10).each { |i| last = i }
=> 1..10
irb(main):002:0> last
NameError: undefined local variable or method `last' for main:Object
        from (irb):2

The good thing is that it jumps up and bites you right away.

22 Mar: inject, inject, inject – OK, summing values was a bad example. I still maintain that having access to local variables outside the block is a better fit to the language.

Downloading Comments and Trackback Pings

I have updated my comment and trackback package to allow comments and trackbacks to be downloaded as xml. The download link is available from the admin interface (presented at the bottom of the index page).

Features

  • Composite Trackback RSS feed
  • Composite Comment RSS feed
  • HTML Sanitization from Brad Choate’s Sanitize Plugin for Moveable Type.
  • Tag Moderation for Trackback and Comments – trackback pings and comments are immediately available with most html tags removed (p allowed for readability). Sanitized tags are displayed upon approval.
  • Outbound Link Blacklist – trackback pings and comments that contain outbound links to banned sites are blocked. The user is responsible for providing a function identifying banned sites.
  • Rejection of trackback pings and comments that contain an excessive number of links.
  • Trackback and Comment deletion (unfortunately, this does not propagate to the RSS feed).
  • Comment Editing

Requirements

  • Web Server capable of running cgi scripts.
  • Perl with the following Perl Modules (I believe that these are core modules as of Perl 5.6.0):
    • File::Spec
    • Storable
    • CGI
    • CGI::Cookie

Download standalone comments and trackback package.

Sanitize HTML in Ruby

For my first foray into Ruby, I’ve created an HTML sanitization method. It is partially based on Brad Choate’s perl sanitize_html (used in my standalone comments and trackback package). While this was not a good exercise in learning Ruby objects, it was a good exercise in Ruby regular expressions and String replacement. With no further ado, here’s my annotated sanitize_html in Ruby:

A basic method declaration. The default set of allowed tags and attributes is provided as the default value for the okTags argument. The soloTags array contains tags that don’t require a closing tag.

1
2
3
def sanitize_html( html, okTags='a href, b, br, i, p' )
# no closing tag necessary for these
soloTags = ["br"]

We begin by building an allowed html tag hash. The hash keys are the allowed html tags and the hash values are arrays of allowed attributes for the respective tag. Here’s the blow by blow breakdown in irb:

1
irb(main):001:0> okTags = 'a href, b, br, i, p'
=> "a href, b, br, i, p"
irb(main):002:0> tags = okTags.downcase.split(',')
=> ["a href", " b", " br", " i", " p"]
irb(main):003:0> tags.collect!{ |s| s.split(' ') }
=> [["a", "href"], ["b"], ["br"], ["i"], ["p"]]
irb(main):004:0> allowed = Hash.new
=> {}
irb(main):005:0> tags.each do |s|
irb(main):006:1* key = s.shift
irb(main):007:1> allowed[key] = s
irb(main):008:1> end
=> [["href"], [], [], [], []]
irb(main):009:0> allowed
=> {"a"=>["href"], "b"=>[], "p"=>[], "br"=>[], "i"=>[]}

And here’s the corresponding code:

1
2
3
4
5
6
7
# Build hash of allowed tags with allowed attributes
tags = okTags.downcase().split(',').collect!{ |s| s.split(' ') }
allowed = Hash.new
tags.each do |s|
key = s.shift
allowed[key] = s
end

Next, we perform a substitution on all <…> elements. We specify a non-greedy, multi-line regular expression (? and m respectively).

1
2
3
# Analyze all <> elements
stack = Array.new
result = html.gsub( /(<.*?>)/m ) do | element |

It’s a closing tag. After verifying that it’s allowed and that the opening tag has already been seen, use the stack to keep tags in matched pairs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
if element =~ /\A<\/(\w+)/ then
# </tag>
tag = $1.downcase
if allowed.include?(tag) && stack.include?(tag) then
# If allowed and on the stack
# Then pop down the stack
top = stack.pop
out = "</#{top}>"
until top == tag do
top = stack.pop
out << "</#{top}>"
end
out
end

It’s a solo tag. Pass through if allowed.

1
2
3
4
5
6
elsif element =~ /\A<(\w+)\s*\/>/
# <tag />
tag = $1.downcase
if allowed.include?(tag) then
"<#{tag} />"
end

It’s an opening tag. Push it onto the stack if it requires a closing tag. Replace with a simple opening tag if there are no allowed attributes. And sweep through the matched element testing for allowed attribute-value pairs if there are allowed attributes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  elsif element =~ /\A<(\w+)/ then
# <tag ...>
tag = $1.downcase
if allowed.include?(tag) then
if ! soloTags.include?(tag) then
stack.push(tag)
end
if allowed[tag].length == 0 then
# no allowed attributes
"<#{tag}>"
else
# allowed attributes?
out = "<#{tag}"
while ( $' =~ /(\w+)=("[^"]+")/ )
attr = $1.downcase
valu = $2
if allowed[tag].include?(attr) then
out << " #{attr}=#{valu}"
end
end
out << ">"
end
end
end
end

Our previous substitution was on matched <…> elements. Now, clean up any >’s that are prior to the first <…> element and any <’s that follow the last <…> element;

1
2
3
4
5
6
# eat up unmatched leading >
while result.sub!(/\A([^<]*)>/m) { $1 } do end

# eat up unmatched trailing <

while result.sub!(/<([^>]*)\Z/m) { $1 } do end

If there are any tags left in the stack, then append the appropriate closing tags to the string.

1
2
3
4
5
6
7
  # clean up the stack
if stack.length > 0 then
result << "</#{stack.reverse.join('></')}>"
end

result
end

April 5: Sanitize HTML in Ruby (cont)

Ruby Follies: do-end not do-done

As an long time shell hacker, one of the most difficult things about Ruby is remembering that it’s do…end not do…done.

in the shell: for i in * do … done

in Ruby: while cond do … end

The other thing to remember is that Ruby doesn’t have a post-test loop. Combine the begin…end construct with while/until instead:

begin … end while cond

begin … end until cond

Fantasy Dream Job

I thought that the ESPN Dream job was pretty cool. But that was nothing compared to this:

This year, the winner of a select fantasy league, in which people use the statistics of real-life players to simulate games, will get an actual front-office job with the San Francisco Giants.

I never played fantasy baseball. And I’m on the wrong coast. But a year with Brian Sabean, crunching numbers and evaluating players, that would be something.

Why Ruby?

PHP wasn’t different enough. I was prepared to convert to PHP. But upon further review, I think that learning PHP is just learning PHP. It’s great at what it does, but it’s not going to lead to a new way of thinking about programming.

I’m queasy about white space in Python. Anyone who has worked with my code knows that I’m a stickler about proper indentation. And once upon a time, before I converted to emacs, Python would have been attractive. But now that I rely upon emacs to manage my indentation for me, managing it myself seems backward.

I need to take a look at Ruby on Rails. I’m a sceptic, but the buzz about Ruby on Rails is hard to resist. And with first things coming first, it must be time to learn Ruby.

March Numerology

With the announcement of the tournament brackets, it’s time for the annual complaining about seeds. Personally; I think that it’s the matchups not the seeding, the 1-seed doesn’t get any extra points. And for my money; if you can’t be a high seed (1-4), then 11 is the lucky number. Your road to the sweet sixteen goes through a 6-seed and a 3-seed. Good teams, but flawed; the kind of teams that get beat by Cinderella on the way to the ball.

Stanford men get an 8-seed. Just about the worst seed you can get. You’re guaranteed a tough game against a 9-seed and a 1-seed is penciled in for your second game. If Stanford gets past Mississippi State and Duke to the sweet sixteen, then I’ll be celebrating like it’s a national championship.

On the distaff side of things, the #1 ranked Stanford women get a 2-seed. And to make things worse, I don’t really disagree. The poll punishes late losses, rewards late wins, and never really corrects for strength of schedule. In a down year [decade?] for the west coast, Stanford didn’t eard a 1-seed.

That being said, what did Stanford do to deserve a projected matchup with 3-seed Connecticut? UConn started the year slow. But it looks like they’ve finally come to grips to basketball without Taurasi. Right now, I think that UConn is the strongest of the 3-seeds and I wouldn’t be surprised if they made the final four.

Survive and Advance

It’s March. If you’re a college basketball fan, then all you want is for your team to play another game. And Friday Thursday night’s 60-58 win over Washington St, makes Stanford a safe tournament pick - ESPN is projecting them as a 7-seed.

In previous years, Stanford has gone into the tournament as a top seed. With all the pressure and expectations that come with that seeding. This year, with a middling seed and a thin bench, every extra game will be gravy.

Rendering Comments in Radio

I was wondering how my Radio Userland weblog would look if comments were rendered onto the page. And since the only way to get rid of temptation is to yield to it, I put together a trial implementation.

This proof of concept retrieves the manually created comments sub-table for this post from the Radio object database and creates the corresponding text for inclusion on the rendered page. A real implementation would retrieve comment data from the comment server and store that data in the appropriate comment sub-table.

I have some issues with the results. The comments are rendered on all pages (main index, monthly archive and daily). I would like for the comments to only be rendered to the daily pages and for the index and monthly archive pages comment links to reference the rendered comments on the daily page. Unfortunately, I don’t have a good plan to achieve that.

PS: I know that the counter in the comments link doesn’t match the number of comments on the page. Just another detail to be worked out.