Saturday, October 6, 2012

Fetch and process information from Cristin

Yesterday I wrote a small Python script to fetch and generate a publication list for web-pages. I use a web-services (ws) provided by Cristin (Current research information system in Norway). Cristin is a research information system for hospitals, research institutes, and universities and university colleges. From the provided ws I use the method hentVarbeiderPerson (see Brukerdokumentasjon Cristin Web Service for details). The argument lopenr is set to the unique id of the researcher (who's publications I want to fetch). The argument format is set to json and the argument sortering is set to to AAR_PERSON_TITTEL (meaning sort the results by year, name and title). The query string then becomes (replace <ID> with the unique id of the user):

http://www.cristin.no/ws/hentVarbeiderPerson?lopenr=<ID>&format=json&sortering=AAR_PERSON_TITTEL

This query will return the publications in the JSON format. You can try this URL (with a real value for the id) in a browser or with command line programs like curl. The Python script processes it and generate HTML. The script is tailored towards publications registered in my name. However, it should be easy to modify the script to match your type of publications.  It reads the fetched JSON file from stdin and write the generated HTML to stdout. Errors and warnings are written to stderr. You should check the warnings. They might give you a hint of type of publications my script doesn't support.

In the returned data from Cristin each publication has two data sets. The "fellesdata" (common data, see commonmap in the script) part includes title, authors, year and some information on what type of publication it is. The "kategoridata" (category data, see catmap in the script) includes information about how the publication was published.  Based on these two sets of information I group the publications in different sections (like Web pages, Conference papers, Journal papers, and Reports, see tmap for details).  The order attribute specifies the order of the information inside each entry, and the porder attribute specifies the order of the sections (different type of publications). Have fun with it. The Python script is written for Python 3, but it should work with Python 2.

No comments: