Monday, 29 March 2010

Principle of least surprise

So it's a pretty easy principle generally things (API's and the like) should generally do the least surprising thing. Reactions like 'What the f***' are a pretty good example of this principle being broken. Imagine my suprise when I found what the following groovy does:


def map = ["key1":"value1"]
map.get("missingKey","defaultValue")

print map

Output:
[key1:value1, missingKey:defaultValue]

The get with default method actually updates the map!
Surely that can't be a good thing to have a 'get' method update the underlying map?


// the preferred groovy idiom is to use the elvis operator
print map["missingKey"]?:"defaultValue"

Saturday, 31 October 2009

screen scrapping using YQL

Just attended the Cambridge DevDay, which I really enjoyed.

Christian Heilmann talked about the Yahoo query language, a very powerful tool for querying not only Yahoo dataset but arbitrary third party ones as well as a bit of URL fetching. The YQL stuff that Christian demo'd was pretty slick, but the bit that really caught my eye was the following little snippit:

select * from html where url="http://finance.yahoo.com/q?s=yhoo" and
xpath='//div[@id="yfi_headlines"]/div[2]/ul/li/a'


Try it for your self over at the YQL console.

So I thought I'd have a bit of a bash at screen scraping my SO profile page to see it I can my answer list on my site.


<c:import url="http://query.yahooapis.com/v1/public/yql" var="feed">
<c:param name="q">select * from html where url="http://stackoverflow.com/users/31480/gid" and
xpath='//div[@class="answer-summary"]'</c:param>
<c:param name="format" value="xml"/>
</c:import>
<x:parse var="xml">${feed}</x:parse>
<ul class="so-answers">
<x:forEach select="$xml/query/results/div[@class='answer-summary']" var="answer"
end="10" >
<li class="answer">
<x:set var="votes" select="$answer/div[contains(@class,'answer-votes')]"/>
<div class="<x:out select="$votes/@class"/>"
title="<x:out select="$votes/@title"/>">
<x:out select="$votes"/>
</div>

<x:set var="a" select="$answer//a[contains(@class,'answer-hyperlink')]"/>
<div class="answer-link">
<a href="http://stackoverflow.com/<x:out select="$a/@href"/>">
<x:out select="$a"/>
</a>
</div>
</li>
</x:forEach>
</ul>

Check it out running

In theory it I could actually point the above statement directly at stackoverflow, but the rather picky xerces parser (used under the covers of the ) complains bitterly about DTD's and all that jazz. The YQL fetch has the nice side effect of tidying up any html ugly ness and spits out easily parsable XML.

Tuesday, 2 June 2009

0.10 of logicalpractice-collections released

No really major changes to note. Main reason for the release is a packaging change that makes the library available via it's own maven repo, see maven setup instructions.

Saturday, 21 March 2009

Python is just a lovely thing

I've used quite a few dynamic scripting languages over the last couple of years including groovy, ruby and python, but I keep coming back to python. I think this time it's due to Peter Butler's (a guy I worked with a while ago) complete love of the language and I think I'm starting to see why.


Over the last week I've been bashing away working on improving the rather outdated www.logicalpractice.com and it occurred to me that it would be a bad idea to generate a sitemap xml for google and the other search bots.


The following code is my solution, I'm sure it's not the best python in the world but I do just kinda like the look.




from __future__ import with_statement
import xmlbuilder
import sys
import os
from datetime import datetime
from xml.dom.minidom import parse as parseDom
from xml.dom.minidom import Node

def url_element(xml, loc,lastmod,changefreq="weekly", priority=0.5):
with xml.url:
if loc.startswith("http:"):
xml.loc(loc)
else:
xml.loc("http://www.logicalpractice.com%s" % loc)

xml.lastmod(lastmod.strftime("%Y-%m-%d"))
xml.changefreq(changefreq)
xml.priority(priority)

def lastmod(file_name):
global basedir
last_mod = os.path.getmtime(os.path.join(basedir,file_name))
return datetime.fromtimestamp(last_mod)

basedir = os.path.join(os.path.dirname(sys.argv[0]), "..","..")
xml = xmlbuilder.builder(version="1.0",encoding="utf-8")

with xml.urlset(xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"):
url_element(xml,"/",lastmod("index.jsp"),priority=1.0)
url_element(xml,"/news.jsp", lastmod("news.jsp"), priority=0.8)
url_element(xml,"/projects.jsp", lastmod("projects.jsp"), priority=0.5)
url_element(xml,"/profile.jsp", lastmod("profile.jsp"), priority=0.5)

# generate elements from the news.rss
rss = parseDom(os.path.join(basedir,"news.rss"))
for node in rss.getElementsByTagName("item"):
link = node.getElementsByTagName("link")[0].firstChild.data
strdate = node.getElementsByTagName("pubDate")[0].firstChild.data
date = datetime.strptime(strdate, "%a, %d %b %Y %H:%M:%S +0000")
url_element(xml, link, date, priority=0.5)

print xml

the xmlbuilder used is from Jonas Galvez via github seems a very simple and elegant solution for building xml documents



How do I know that python must be a good thing? Well anything that I get up at 5 in the morning to code a bit more of before work has to be a good thing.

Wednesday, 11 February 2009

Java assert - what I learnt today

I learnt something new today, the following I thought would be just fine:

File f = new File("foo.txt");

if( f.exists() )
assert f.delete();

That all looked good to me, right up to the point that I discovered that an assert expression isn't even evaluated if asserts are not enabled, had to change it to.

if( f.exists() ){
boolean deleted = f.delete();
assert deleted;
}



Friday, 8 February 2008

running sureFire twice in a build

I've run into this a couple of times in my experience with maven2. A single module web project, nothing to complex, just some servlets and a simple data access layer. What I wanted to be able to do was run both unit tests and integration tests both written with JUnit. So initially I started with:


<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<skip>true</skip>
</configuration>
<executions>
<execution>
<id>integration-test</id>
<phase>integration-test</phase>
<goals>
<goal>test</goal>
</goals>
<configuration>
<skip>false</skip>
</configuration>
</execution>
</executions>
</plugin>
This seemed like at least half the solution. It moves the test run to integration-test, where both the unit test and integration tests run. Not so bad I figured, I worked with this for the last couple of months and it's been bugging me. Last week I came up with:



<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<skip>true</skip>
</configuration>
<executions>
<execution>
<id>test</id>
<phase>test</phase>
<goals>
<goal>test</goal>
</goals>
<configuration>
<skip>false</skip>
<excludes>
<exclude>**/*IntegrationTest.java</exclude>
</excludes>
</configuration>
</execution>
<execution>
<id>integration-test</id>
<phase>integration-test</phase>
<goals>
<goal>test</goal>
</goals>
<configuration>
<skip>false</skip>
<includes>
<include>**/*IntegrationTest.java</include>
</includes>
</configuration>
</execution>
</executions>
</plugin>


It requires that the tests are named correctly, but that isn't such a big deal. So now the test phase runs just the unit tests (named *Test.java) and integration-test runs everything but in two phases, first "test" and second in integration-test.

Saturday, 26 January 2008

a bit of fun with collections

A while ago a friend introduced me to hamcrest matchers and soon after that I started looking in to how I could start to use these to make working with collections. I've spent a little while working with dynamic languages such as groovy, ruby and python and they have got under my skin. In the java world you just don't have the flexiblity of fancy closures and chained method calls.

In order to satisfiy my craving for collection tools, what I really want to do is what this guy claims to be able to do. I first looked at Sam Newman's hamcrest-collections project, it's good, but it didn't do what I really wanted completely.

So in true if it wasn't invented here style I've written my own collections library.

http://code.google.com/p/logicalpractice-collections/

It's not 100% complete, but it does allow you to do some quite clever stuff:

smiths = select(from(people).getLastName(), equalToIgnoringCase("smith"));

Give it a go and let me know what you think.