Static Publications Site-Tutorial (ORC-Schlange)

Probably every research group in the world faces the problem of collecting all publications of all group members. Usually, the list publications is displayed in structured way on the group's homepage to provide an overview of the research topics and impact of the group.

The larger and older the group, the more publications are in this list and the more painful is the manual collection of the publication list. Additional features such as searching for authors, keywords, and titles, linking additional author data to the publication (such as membership periode in the group), and handling name changes turn a simple publication list in a interesting use case for big data.

A effective solution to this problem is given in this tutorial. The tutorial is written for python starters and gives an introduction in many techniques:

  • Advanced features of python 3.6
  • Interacting with SQLite in python
  • Interacting with a REST-API in python.
  • Interacting with the ORCID public API
  • Reading and writing bibtex files in python
  • Creating HTML of a bibtex file in python
  • Filtering HTML content with javascript

Understanding basic python syntax is required for this tutorial but all advance features are explained.

The tutorial is subdivded into 8 parts. Each part introduces a technique and demonstrate the its usage for the use case. You can, thus, jump to the part of interest or follow the tutorial step by step. A full understanding of the use case can only be achieved by reading the complete tutorial.


Introduction

In many institutes the publicationlist is handled manually either by the administrator or by every member of the group. The result is on both cases the same the sites are outdated. These have different reasons publications got lost on the way to the administrator or somebody forgets to insert her work. Furthermore, the publications are usally stored in a database which is potential open for attacks. Since these systems are often some handmade solutions, even the possibility of filtering the data is limited. Thus, finding a relevant publication might be diffcult even if the list is up-to-date.

Therefore, we need a system with these features:

  1. It updates itself
  2. It needs no administration
  3. It is a static site with no DB in the background
  4. It is interactive and allows filtering of the data.

The largest problem here is clearly point 1. It leads to the question: From where get most recent data?

Fortunately, there is a project called ORCID (https://orcid.org). There Vision is:

"ORCID’s vision is a world where all who participate in research, scholarship, and innovation are uniquely identified and connected to their contributions across disciplines, borders, and time."

These matches perfectly with our aim to collect all data of a group of researcher. So the idea is clear:

  • Collect the ORCIDs of your resaracher.
  • Collect all publications for the IDs.

We are even more lucky because ORCID has a free public API. Thus, data can be collected automatically.

ORCHID runs a sandbox isntance with this API which comes handy into play for our tutorial.

For this tutorial, there are three fictional ORCIDs in the Sandbox that we use:

The complete project is written following the object-oriented prgramming paradigm , i.e. is encapsulated into classes. In Python all these classes can be put in one file or can be spread in different packages. In the tutorial, all classes are put into one file. The tutorial is written to work with python 3.6.

An overview of the data flow in the tutorial looks as follows:

The data is hold into four different forms (horizontal lines) and six different states (vertical lines). The data flow is shown by the arrows. In the bottom the corresponding Sections of the tutorial shown. 


SQLite in Python

The first step is to collect the ORCIDs of you researcher. We must only save the informational content. The "-" are the same for every ORCID so they have no information can be hiden. In our case we have three of them:

  • 0000000219094153
  • 000000020183570X
  • 0000000303977442

These three must be saved somewhere. It is the only part where the program interacts with a DB and it is done only in the backend. Thus there is no possibility to interact with this DB from outside.

We use a really simple SQLite DB with one table "people" that have three fields:

  1. orcid CHARACTER(16)
  2. start DATE
  3. end DATE

The first field is the ORCID of you researcher. The other two represent the periode of time the researcher belongs to your institute. Of course, people changes groups and so a publication that is written before or after there attendance in the institute should not be listed in your publicationlist. In our case, SQLite DB is a good idea due to the simplicity of the DB and the fact that the queries on the DB are not time critical.

To interact with SQLite python has a standard library: sqlite3. This library fulfills the python DB-API 2.0 standard. As a result, it can be replace with any other SQL DB interface and the commands work the same.

First a Connection to the DB is created. Afterwards, a Cursor is retrieved from this connection. The cursor can execute SQL commands. After all commands are executed, the Connection is used to save the data with a commit. At the end, the Connection is closed with the close command.

The following simple python script to fills the DB.

import sqlite3

conn = sqlite3.connect('people.db')
c = conn.cursor()

c.execute("CREATE TABLE people (orcid CHARACTER(16), start DATE, end DATE)")
c.execute("INSERT INTO people VALUES ('0000000219094153','1900-01-01','2016-12-31')")
c.execute("INSERT INTO people VALUES ('000000020183570X','1900-01-01','2016-12-31')")
c.execute("INSERT INTO people VALUES ('0000000303977442','1900-01-01','2016-12-31')")

conn.commit()

conn.close()

Line 3 and 4 creating the connection and the cursor. The file where the SQLite DB is saved is named "people.db". In Line 6 the table is created as described above. In line 7-9 the values of the three ORCIDs are insert. The three researchers stay from January 1, 1900 to December 31, 2016 at our fictional institute. The scripts is only run once. After that, this the data is saved in the people.db and can be read.

Now, a new script is written that contain all classes of this script and also the main function. These script is named __main__.py. These name is used in python to express that a main function of the package is contained in these file. The command "python ." can be used to start the script.

The first class represents a DB object that interacts with the SQLite DB. It has a initialization function, a function to get the data as list and a function to close the connection:

class DB:
	def __init__(self,path="people.db"):
		self.conn = connect(path)
		self.c = self.conn.cursor()
	def getList(self):
		self.c.execute('SELECT * FROM people')
		return self.c.fetchall()
	def close(self):
		self.conn.close()

In line 2 the initialization is startet with the __init__ keyword. The initialization has an optional parameter that contains the path to the DB file. The getList function in Line 6 uses SQL syntax to select all field and all entries from the table people. The DB-API 2.0 function fetchall fetches all entries and return theme as a list of tuples. These tuples contain all three fields as strings.The last function then close the Connection to the DB.

We create a simple main function that uses the DB class:

if __name__ == "__main__":
	db = DB()
	print(db.getList())
	db.close()

It should produce an output like this:

[('0000000219094153', '1900-01-01', '2016-12-31'), ('000000020183570X', '1900-01-01', '2016-12-31'), ('0000000303977442', '1900-01-01', '2016-12-31')]

The complete result of this section can be downloaded here:


Date, OrcID and WorkSummary Class (Filter 1)

In the following, we will define our data model. It consists of three classes: Date, OrcID, and WorkSummary. Data retrieved from ORCID using the ORCID-API will be parsed into objects of these types. The classes furthermore implement methods for comparsion which will be used to filter the data.

The first class is the Date class. It represent a date and is either a publcation date or start or stop date of a user's membership at the inistute. The specialty of this Date class is that it only need a year to be a valide date. Undefined parts are 'None' and are considered as equal to every thing. A function is implemented that represent these fact.

class Date:
	def __init__(self, y, m, d):
		self.y = int(y)
		self.m = int(m) if m else None
		self.d = int(d) if d else None
	def check (self, other, attr):
		if getattr(self,attr) == None or getattr(other,attr) == None:
			return 1
		if getattr(self,attr) < getattr(other,attr):
			return 1
		if getattr(self,attr) > getattr(other,attr):
			return -1
		return 0
	__le__ = lambda self, other: True if 1 == (self.check(other,"y") or self.check(other,"m") or self.check(other,"d") or 1) else False
	__str__ = lambda self: str(self.y) + "-" + str(self.m) + "-" + str(self.d)

In line 2-5 is the initialization of the Date class where the year, month, and the day can be a number, a string that represent the number or in case of month and day None. To make a number out of the string the built-in function int() is used. In Line 4 and 5 a conditional expression is used to check if the month (day) is None. Python interprets different values as False. The defintion is:

In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: False, None, numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets). All other values are interpreted as true.

Here a None is interpreted as False and goes in the part after the else so that None is saved for this variable. These is necessary because "int(None)" results in an error.

In line 6-13 a helper function that check for a other date and a given attribute ("y","m", or "d") if they one of them are smaller at this attribute. To get access to these attribute an other build-in function is used, the getattr() function. Line 7 is the check if the attribute is None in one of the dates. If this is the case 1, is returned to show that self is smaller or equal to other. In the case where self is smaller then other (line 9) also 1 is returned. If self is larger then other for this attribute (line 11), a -1 is returned. When nothing of these is the case the attribute needs to be equal in these case a 0 is returned.

In line 14, the function is defined. In a python object, this is done by overwriting the __le__ function.  The function is written as a lambda function. It gets as arguments self and other. Then there is again a conditional expression that return True if a chain of checks returns 1 and False otherwise. These chain is linked with "or" and works because of the special definiton of or:

"The expression x or y first evaluates x; if x is True, its value is returned; otherwise, y is evaluated and the resulting value is returned."

First, the years (y) are compared. If they are equal, a 0 is returned. This is interpreted as False and the next check is evaluated. Otherwise, if the self.y is smaller then other.y, a 1 is returned from the check and the evaluation stops. The result then is a 1 and True is returned. If self.y is larger then other.y the check returns a -1 which also stop the evaluation but is unequal to 1. So False is returned. Thus, all three parts from year to day are checked. If all are equal, all are interpreted as False, if so the fourth case of the chain, where simply a 1 is returned, is evaluated. So that in this case the function also returns True.

In the last line, the standard string represantation of the Date class is overwritten. These is again a keyword function(__str__). Again a lambda function is used that simply concatenate the strings of the three parts with a "-" between them.

The second class is the OrcID class. This class represents an ORCID as stored ind the SQLight DB. So, it gets an id, a start and a stop date for initialization. We define also two helper functions. First, a getID function that formats the ORCID with a "-" every 4 symbols. Secondly, a function to get a nice string representation of the OrcID object.

class OrcID:
	def __init__(self, id, start, stop):
		self.id = id
		self.start = Date(*start.split("-"))
		self.stop = Date(*stop.split("-"))
	getID = lambda self: "-".join([self.id[4 * i : 4 * (i + 1)] for i in range(4)])
	__str__ = lambda self: self.getID() + ": " + str(self.start) + " - " + str(self.stop)

The initialization in line 2-5 saves the id, converts the start and stop dates into Date objects, and stores them. These conversations in line 4 and 5 expect that the input string is in the format "YYYY-MM-DD". This is the format how SQLite returns the dates. First, the String function split is used to make a list out of this [y,m,d]. The list then is unpacked in single arguments with the "*" operator.

In line 6 the getID function is defined as lambda function. The goal to place every 4 symbols a "-" is reached by first creating a list of these 4 symbol blocks. For this, i iterate over the range(4) i.e. [0,1,2,3]. and create then the block from 4*i to 4*(i+1). Afterwards, these blocks are joined in one string with "-" as separator with the str.join function. The last line is the simple string representation of the OrcID object in the form 'id:start-end'.

To test these two classes, we change the main function to as follows:

if __name__ == "__main__":
	db = DB()
	orcs = [OrcID(*t) for t in db.getList()]
	db.close()
	for orc in orcs:
		print("Do something with",orc)

In line 3, a list comprehension is used to create a list of OrcIDs out of the list of tuples that db.getList() returns. Again, unpacking is used to resolve the tuple that are saved in t to the parameters that OrcID gets.

The output should look like this:

Do something with 0000-0002-1909-4153: 1900-1-1 - 2016-12-31
Do something with 0000-0002-0183-570X: 1900-1-1 - 2016-12-31
Do something with 0000-0003-0397-7442: 1900-1-1 - 2016-12-31

 

The last class is the WorkSummary class. These class represent a summary of a work i.e. a publication. A WorkSummary has three fields: a path where more information can be found, a title and a publication date. These class should be compared later in the filter step. Thus, a smaller and an equal function are implemented.

class WorkSummary:
	def __init__(self, path, title, date):
		self.path = path
		self.title = title
		self.date = date
	__lt__ = lambda self, other: self.date.y < other.date.y or (self.date.y == other.date.y and self.title < other.title)
	__eq__ = lambda self, other: self.date.y == other.date.y and self.title == other.title
	__str__ = lambda self: self.title + ": " + str(self.date)

The initialization in line 2-5 is straight forward.

The two comparisons in line 6 and 7 will only compare the year of the publication and the title. The rest of the Date is not used to make the comparison not over specific. Both comparisons are written in lambda form. In line 6, the smaller function uses the keyword __lt__ and checks if the year is smaller or if the year is equal and if the title is smaller. For the equal comparison, the keyword __eq__ is used. The function is straight forward: only when the year and the title are equal the WorkSummaries are equal. The last line is a simple string representation of the WorkSummary object in the form "title: data".

The complete result of this section can be downloaded here:


Rest-API

Now we have every thing ready to interact with the public ORCID-API. To do this, we use a third party library: the Requests: HTTP for Humans library.

The requests library says about it self:

"Requests is the only Non-GMO HTTP library for Python, safe for human consumption."

Using this library is very simple and straight forward. However, first we need to install it. Thanks to pip this is very simple. You have to run:

pip install requests

If this not work, have a look in the instalation introductions of requests.

With requests installed, we can look in the public API of ORCID. As a reminder, we want read public data out of the sandbox instance of ORCID. We use the 2.0 version of the API and get all answers in json format. How this is done is explained on the website of the API in a Basic tutorial from ORCID.

This tutorial will cover all required endpoints and queries used for the publication list. However, if you want further information or change the queries, the links above are good points to start.

The interaction with the API needs to be authorized. This authorization uses a client_id and client_secret that you can create from you account for an app. Here we can simple use already created data from Norbert E. Horn. Since it is in the sandbox. the data is not so important to be secret. With these data, we can receive a read-public access token. This is the first API interaction with a specific endpoint. The answer contains a the token. Thes token is then used for all other interactions with the API and send with every request.

Beside this initial interaction to get the token, we have two more interactions with the API: get all WorkSummarys of a specific OrcID and get the complete work for a specific WorkSummary. However, we start with the initialization of the class where we get the access token:

from requests import Session
class API:
	auth = "https://sandbox.orcid.org/oauth/token"
	ORC_client_id = "APP-DZ4II2NELOUB89VC"
	ORC_client_secret = "c0a5796e-4ed3-494b-987e-827755174718"
	def __init__(self):
		self.s = Session()
		self.s.headers = {'Accept': 'application/json'}
		data = {"grant_type":"client_credentials", "scope":"/read-public","client_id":self.ORC_client_id, "client_secret":self.ORC_client_secret}
		r = self.s.request(method ="post",url= self.auth, data=data)
		self.s.headers = {'Accept': 'application/json',  "Access token":r.json()["access_token"]}

In the first line a class called Session is loaded from the requests library. This is the Session that interact with the API.

In line 3-5 the class variables are save that are used to get the access token, first the url which provides the token for the ORCID-Sandbox, then the data of Norber E. Horn.

In line 6, the initialization starts. In line 7, a new Session object is created and saved in self. In line 8, the headers property of this Session is set. It is a dict with data that is send as header in all requests that are made with this session. At this point, we save the information that we want the answers in json format. In line 9, a dict is saved that contains the data that is send with the request. Four property are saved:

  • The "grant_type" which is set to "client_credentials" because we use client credentials to get the token.
  • The "scope" which is set to "\read-public" because we want a token to read public data.
  • The "client_id" to authenticate the access.
  • The "client_secret" to verify the authentication.

In line 10, finally,  the request is send. This is done by the Session object with the request function. This function is given three arguments:

  • The method which is the HTTP method here we make a "post".
  • The url which is here the auth address.
  • The data as the prepared data dict.

The result is a Response object. The function json() will parse the response and creates a dict of the result. In the dict, the key "access_token" stores the token (as a string). In line 11, it is parsed and saved as a new header for the Session.

The next class function is the getWorks function that get all WorkSummarys of a given ID. To do this, we need the endpoint of the ORCID-API returning this information. An overview of all endpoints are given in the basic tutorial as a table. The endpoint that we are looking for is "/works". It gives a summary of research works. To complete the URL, we also need the resource URL for the v2.0 public API of the sandbox: "https://pub.sandbox.orcid.org/v2.0". The complete URL is then:

https://pub.sandbox.orcid.org/v2.0/[ORCID iD]/works

An overview how the response from the API look like is described on github. However, these is not trivial to understand. To get the idea faster, it is better to look in one response example.

In the following, a shorted example result ist shown:

{
	'last-modified-date': {'value': 1497863814424}, 
	'group': [
		{
			'last-modified-date': { 'value': 1497791040610}, 
			'external-ids': {
				'external-id': []
			}, 
			'work-summary': [
				{
					'put-code': 837564, 
					'created-date': {'value': 1497791040610}, 
					'last-modified-date': {'value': 1497791040610}, 
					'source': {
						'source-orcid': {
							'uri': 'http://sandbox.orcid.org/0000-0002-1909-4153', 
							'path': '0000-0002-1909-4153', 
							'host': 'sandbox.orcid.org'
						}, 
						'source-client-id': None, 
						'source-name': {'value': 'Norbert E. Horn'}
					}, 
					'title': {
						'title': {'value': 'Finding the data unicorn: A hierarchy of hybridity in data and computational journalism'}, 
						'subtitle': None, 
						'translated-title': None
					}, 
					'external-ids': {
						'external-id': []
					}, 
					'type': 'JOURNAL_ARTICLE', 
					'publication-date': {
						'year': {'value': '2017'}, 
						'month': None, 
						'day': None, 
						'media-type': None
					}, 
					'visibility': 'PUBLIC', 
					'path': '/0000-0002-1909-4153/work/837564', 
					'display-index': '1'
				}
			]
		}, 
		...
	],
	'path': '/0000-0002-1909-4153/works'
}

The response is a json object with three members: last-modified-date, group, and path. Only the group member is at the moment interesting because it is a list of the work-summaries. In these list, for every work of the researcher, a json object with again three member is stored. For us, only work-summary member is interesting. It is a list with exactly one element which is the work-summary object.

This work-summary object has many members. For us are three of them interesting: the title, the publication-date and the path. The last one contains the API path to the complete record of this work. The title is again an object where the title as string can be found in title.value. The publication-date is an object that has for year, month, and day separate members. They can be None or store the respectic date value in value.

Now, we need a python function that parse these information:

	baseurl = "https://pub.sandbox.orcid.org/v2.0"
	getDate = lambda self,d: Date(d["year"]["value"],d["month"]["value"] if d["month"] else None, d["day"]["value"] if d["day"] else None )
	def getWorks(self,id):
		r = self.s.request(method= "get",url = "{0}/{1}/works".format( self.baseurl, id.getID()))
		for work in (w["work-summary"][0] for w in r.json()["group"]):
			yield WorkSummary(work["path"],work["title"]["title"]["value"],self.getDate(work["publication-date"]))

In line 1, the baseurl is saved as class variable so that it can be used later. In line 2, a helper function is defined that transforms a publication-date object from the API into a Date object as described above. The function is a lambda function and gets  self  and the date in dict format as d as input. The transformation is not completly trivial because the day should be None if  ["day"] is None and otherwise it is the value of  ["day"]["value"]. The last one creates an error if ["day"] is None. To solve this, a conditional expression is used to check if ["day"] is not None. If so the value of ["day"]["value"] is used. The same must be done for the month.

The getWorks function get as argument an id which should be an OrcID object. For these id, all summaries are loaded and a all WorkSummary objects are returned. However, a look in the function shows that it is not a normal function with a return keyword instead the yield keyword is used. This makes the function to an generator function that returns a iterator.  Such iterator can be used in a for loop and in every iteration the next() function is called. The generator is executed like every function until a yield is reached. The return value behind yield is the result of the first next(). The status of the function is saved. Every time the next() function on the iterator is called, the function continues at the point where the last yield was called until the next yield is reached. This means that after one value is processed, it is discarded from memory. When no further yield is reached, the iteration stops.

In line 4, the request is sent. It is in this case the "get" method used like the API expect it. The url is here created with string formating. In line 5, the iteration over the works starts. This is done with a special case of the list comprehension. The works are given as list in the "group" member. So we want to iterate over all elements in this list. The objects are saved in w. For these objects, the work summary is found in the first element of the member "work-summary" so, we want only iterate overt this. A list comprehension: 

[w["work-summary"][0] for w in r.json()["group"]]

creates a list with the exactly the objects that we iterate through. However, in the getWorks function the outer "[]" are replaced with "()". This means that we not create a list but we create an iterator with generator expression. Thus, every element is created when it is needed and discarded afterwards. This is more memory friendly then creating the complete list. In line 6, for every work, a WorkSummary object is created and yielded. The path and title are obtained using a simple routing to the strings in the json. For the date, the getDate function is used.

Using this function, we can create a list off all works and can filter them. This is described in the next chapter. However, after the filtering the complete data of the work should be parsed. So, we need a second function in the API that get the complete work for a WorkSummary. The endpoint of this is "/work/[id]". These endpoints are already saved with the id in the WorkSummarys as path.

The following shows an example for a response of this endpoint:

{
	'created-date': {'value': 1497791040610}, 
	'last-modified-date': {'value': 1497791040610}, 
	'source': {
		'source-orcid': {
			'uri': 'http://sandbox.orcid.org/0000-0002-1909-4153', 
			'path': '0000-0002-1909-4153', 
			'host': 'sandbox.orcid.org'
		}, 
		'source-client-id': None, 
		'source-name': {'value': 'Norbert E. Horn'}
	}, 
	'put-code': 837564, 
	'path': '/0000-0002-1909-4153/work/837564', 
	'title': {
		'title': {'value': 'Finding the data unicorn: A hierarchy of hybridity in data and computational journalism'}, 
		'subtitle': None, 
		'translated-title': None
	}, 
	'journal-title': {'value': 'Digital Journalism'}, 
	'short-description': None, 
	'citation': {
		'citation-type': 'BIBTEX', 
		'citation-value': '@article{hermida2017finding, title= {Finding the data unicorn: A hierarchy of hybridity in data and computational journalism}, author= {Hermida, Alfred and Young, Mary Lynn}, journal= {Digital Journalism}, volume= {5}, number= {2}, pages= {159--176}, year= {2017}, publisher= {Routledge}}\n\n'
	}, 
	'type': 'JOURNAL_ARTICLE', 
	'publication-date': {
		'year': {'value': '2017'}, 
		'month': None, 
		'day': None, 
		'media-type': None
	}, 
	'external-ids': {'external-id': None}, 
	'url': None, 
	'contributors': {'contributor': []}, 
	'language-code': None, 
	'country': None, 
	'visibility': 'PUBLIC'
}

The response is one object with many members. Part of them are already known from the summary, others are new but also not interesting for us. In fact, the only way to get the complete record is to read the citation. The other members do not contain all information. So here, we interested in this and simple want a function that returns the citation-value.

The function looks as follows:

	def getWork(self, summary):
		r = self.s.request(method= "get",url= self.baseurl + summary.path)
		return r.json()['citation']['citation-value']

The function is defined straight forward. It gets a summary which is a WorkSummary object as input. First, the request is made (line 2). It is again a "get" and the url is the combination of the baseurl and the path of the summary. Then, from the response the citation-value is obtained and returned.

The complete class:

from requests import Session
class API:
	auth = "https://sandbox.orcid.org/oauth/token"
	ORC_client_id = "APP-DZ4II2NELOUB89VC"
	ORC_client_secret = "c0a5796e-4ed3-494b-987e-827755174718"
	def __init__(self):
		self.s = Session()
		self.s.headers = {'Accept': 'application/json'}
		data = {"grant_type":"client_credentials", "scope":"/read-public","client_id":self.ORC_client_id, "client_secret":self.ORC_client_secret}
		r = self.s.request(method ="post",url= self.auth, data=data)
		self.s.headers = {'Accept': 'application/json', "Access token":r.json()["access_token"]}
	baseurl = "https://pub.sandbox.orcid.org/v2.0"
	getDate = lambda self,d: Date(d["year"]["value"],d["month"]["value"] if d["month"] else None, d["day"]["value"] if d["day"] else None )
	def getWorks(self,id):
		r = self.s.request(method= "get",url = "{0}/{1}/works".format( self.baseurl, id.getID()))
		for work in (w["work-summary"][0] for w in r.json()["group"]):
			yield WorkSummary(work["path"],work["title"]["title"]["value"],self.getDate(work["publication-date"]))
	def getWork(self, summary):
		r = self.s.request(method= "get",url= self.baseurl + summary.path)
		return r.json()['citation']['citation-value']

 With this, we can write a new main class that gets all the WorkSummarys and print them:

if __name__ == "__main__":
	db = DB()
	orcs = [OrcID(*t) for t in db.getList()]
	db.close()
	alldocs = []
	api = API()
	for orc in orcs:
		alldocs += api.getWorks(orc)
	for d in alldocs:
		print (d)

In line 9, the getWorks is called and the resulting list is added with a "+=" to the alldocs list.

The output should look like this:

Finding the data unicorn: A hierarchy of hybridity in data and computational journalism: 2017-None-None
The generalized unicorn problem in Finsler geometry: 2015-None-None
Unicorn: A system for searching the social graph: 2013-None-None
The unicorn, the normal curve, and other improbable creatures.: 1989-None-None
Combined Measurement of the Higgs Boson Mass in p p Collisions at s= 7 and 8 TeV with the ATLAS and CMS Experiments: 2015-None-None
It's a small world: 1998-None-None
Combined Measurement of the Higgs Boson Mass in p p Collisions at s= 7 and 8 TeV with the ATLAS and CMS Experiments: 2015-None-None
11 The Death of the Author: 1994-None-None
Kritik der reinen Vernunft: 1889-None-None

The complete result of this section can be downloaded here:


Sorting and Filter (2)

In this step, we want get rid of works that are not in the period a person worked for us. We also want to remove duplicated works.

For this, we use the comparison functions that are implemented in the classes and some standard python libraries.

First, we want to get rid of the works that not overlap with the dates of the OrcID objects i.e. the works that are not belong to our group.

This is done by altering the line where the getWorks are added to alldocs:

alldocs += [d for d in api.getWorks(orc) if orc.start <= d.date <= orc.stop]

Here, we do again a list comprehension but with an if statement in it. This means that in the new list only these elements are contained for which the if condition is evaluated as True. The statement checks if the date of the work (d) is between the start and end date of the OrcID. Note the chained form of these two checks with two <= statements. In fact, this is more effective then two separate statements:

"Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false)."

Doing so, we have now a list (alldocs) containing all works that are done by the group. The next step is to sort them. For this the WorkSummary class has already the smaller and equal operation so that we can do a simple sort() call an it:

alldocs.sort()

The last part is now the reducing the duplicated entries. This can be done using the standard library itertools and the groupby function. These function needs a sorted list as input so that equal objects are grouped together. The function reduces these groups to a tuple of a key and a list of the objects. Here, the key is the first of these objects. So that we can simple iterate over the function result keeping only the keys:

import itertools
uniqdocs = [doc for doc,_ in itertools.groupby(alldocs)]

We create a new list that contains only the key of the groupby call. The first element (the key) is saved in doc and the second is with the "_" symbol marked to be thrown away.

With these changes the main function looks as follows:

import itertools
if __name__ == "__main__":
	db = DB()
	orcs = [OrcID(*t) for t in db.getList()]
	db.close()
	alldocs = []
	api = API()
	for orc in orcs:
		alldocs += [d for d in api.getWorks(orc) if orc.start <= d.date <= orc.stop]
	alldocs.sort()
	uniqdocs = [doc for doc,_ in itertools.groupby(alldocs)]
	for d in uniqdocs:
		print (d)

The result is:

The unicorn, the normal curve, and other improbable creatures.: 1989-None-None
11 The Death of the Author: 1994-None-None
It's a small world: 1998-None-None
Unicorn: A system for searching the social graph: 2013-None-None
Combined Measurement of the Higgs Boson Mass in p p Collisions at s= 7 and 8 TeV with the ATLAS and CMS Experiments: 2015-None-None
The generalized unicorn problem in Finsler geometry: 2015-None-None

The list is now sorted and all duplications are removed.

The complete result of this section can be downloaded here:


BibTeX in Python

BibTeX is a format to hold a lists of references. It is interesting for us because the information is saved in ORCID as a BibTeX formatted string. We need to parse this string and convert it to a useful representation.

However, we do not need to reinvent the wheel here. There is already a parser for bibtex. It exists a python base program that is called Pybtex. It brings also a Python-API that we can use.

First we need to install Pybtex on the system with pip:

pip install pybtex

If this not work, have a look in the Git-Reposity for more installation instructions.

An overview of the python libary can be found in the Documentation of Pybtex.

To read the bibliography from a string, a simple function exists:

pybtex.database.parse_string(value, bib_format, **kwargs)

The return value is a BibliographyData object. The plan is to parse every work and then combine these objects to one BibliographyData object. This can be achieved by creating an empty BibliographyData object and then add the entries of all parsed BibliographyData to this object. To get access to the entries, the class variable entries is used that hold a dict of all entries. A simple helper function is written that does these.

The resulting BibliographyData object can be written to a BibTeX file. This is done using the to_file function of the BibliographyData object. A simple solution to view the content of such a BibliographyData object is to write it in a file.

The complete code looks as follows:

from pybtex.database import BibliographyData, parse_string
def joinBibliography(bib1, bib2):
	for key in bib2.entries:
		bib1.entries[key] = bib2.entries[key]

if __name__ == "__main__":
	db = DB()
	orcs = [OrcID(*t) for t in db.getList()]
	db.close()
	alldocs = []
	api = API()
	for orc in orcs:
		alldocs += [d for d in api.getWorks(orc) if orc.start <= d.date <= orc.stop]
	alldocs.sort()
	uniqdocs = [doc for doc,_ in itertools.groupby(alldocs)]
	bib = BibliographyData()
	for d in uniqdocs:
		joinBibliography (bib,parse_string(api.getWork(d),"bibtex"))
	bib.to_file(open("out.bib","w"))

In line 1, the BibliographyData class and the parsing function are loaded from pybtex. The helper function (line 2-4) simple adds all entries from bib2 to the entries from bib1. Until line 16, it is the normal main function. Then in line 16, the new empty BibliographyData object (bib) is created that is used to collect all data. In line 18, the API function getWork is used to get the BibTex format of the entry. The result format "bibtex" is given as arguments for the parsing function. The result is added to bib with the helper function. In the last line, the result then is written to a file named "out.bib".

The content of out.bib should look like this and it can be downloaded here:

@article{micceri1989unicorn,
    author = "Micceri, Theodore",
    title = "The unicorn, the normal curve, and other improbable creatures.",
    journal = "Psychological bulletin",
    volume = "105",
    number = "1",
    pages = "156",
    year = "1989",
    publisher = "American Psychological Association"
}

@article{barthes199411,
    author = "Barthes, Roland",
    title = "11 The Death of the Author",
    journal = "Media Texts, Authors and Readers: A Reader",
    pages = "166",
    year = "1994",
    publisher = "Multilingual Matters"
}

@article{collins1998s,
    author = "Collins, James J and Chow, Carson C",
    title = "It's a small world",
    journal = "Nature",
    volume = "393",
    number = "6684",
    pages = "409--410",
    year = "1998",
    publisher = "Nature Publishing Group"
}

@article{curtiss2013unicorn,
    author = "Curtiss, Michael and Becker, Iain and Bosman, Tudor and Doroshenko, Sergey and Grijincu, Lucian and Jackson, Tom and Kunnatur, Sandhya and Lassen, Soren and Pronin, Philip and Sankar, Sriram and others",
    title = "Unicorn: A system for searching the social graph",
    journal = "Proceedings of the VLDB Endowment",
    volume = "6",
    number = "11",
    pages = "1150--1161",
    year = "2013",
    publisher = "VLDB Endowment",
    doi = "10.14778/2536222.2536239"
}

@article{aad2015combined,
    author = "Aad, Georges and Abbott, B and Abdallah, J and Abdinov, O and Aben, R and Abolins, M and AbouZeid, OS and Abramowicz, H and Abreu, H and Abreu, R and others",
    title = "Combined Measurement of the Higgs Boson Mass in p p Collisions at s= 7 and 8 TeV with the ATLAS and CMS Experiments",
    journal = "Physical review letters",
    volume = "114",
    number = "19",
    pages = "191803",
    year = "2015",
    publisher = "APS"
}

@article{cheng2015generalized,
    author = "Cheng, Xinyue and Zou, Yangyang",
    title = "The generalized unicorn problem in Finsler geometry",
    journal = "Differential Geometry-Dynamical Systems",
    volume = "17",
    pages = "38--48",
    year = "2015"
}

To create such a bib out of the OrcIDs are already a useful application. The bib data is a standard that can be used in many cases. However, in our case we want to go a step further and create a pretty Website out of the data.

To our advantage Pybtex already has a system to write HTML files base on a BibliographyData object. The way, that this is done, is simple: first, a Style is created to format the data and then a Backend is used to write the data in the right format. It already exists a HTML backend that we can use. As a simple style we can use the standard "unsrt" bibliography style.

With using this two, the main function looks like as follows:

from pybtex.style.formatting.unsrt import Style
from pybtex.backends.html import Backend
if __name__ == "__main__":
	db = DB()
	orcs = [OrcID(*t) for t in db.getList()]
	db.close()
	alldocs = []
	api = API()
	for orc in orcs:
		alldocs += [d for d in api.getWorks(orc) if orc.start <= d.date <= orc.stop]
	alldocs.sort()
	uniqdocs = [doc for doc,_ in itertools.groupby(alldocs)]
	bib = BibliographyData()
	for d in uniqdocs:
		joinBibliography (bib,parse_string(api.getWork(d),"bibtex"))
	style = Style()
	formatbib = style.format_bibliography(bib)
	back = Backend()
	back.write_to_file(formatbib,"out.html")

In line 1, is the unsrt Style loaded and in line 2, the html Backend. In line 16, the new Style object is created and in line 17, it is used to create a formatted bibliography. In line 18, the Backend object is created and in line 19, it is used to write the formatted bibliography in a file called "out.html".

Here is the resulting htm site (download here):

This does not look pretty, so, we start to tweak the result. First, we look in the Style.

The idee of the style is that for every entry a Rich text object is created. This objects than are rendered from the backends.

The Rich text has six classes:

  • Text
  • String
  • Tag
  • HRef
  • Protected
  • Symbol

The Symbol is the smallest atom that represent one special symbol, like a line break. The String class is an other atom of the Rich text classes, the last atom is Protected which is not affected by case-changing operations. The other classes are containers that can contain all Rich text classes. A HRef creates a link to something and Tag has a name and creates a tag with this name. Text than is a container with no special feature.

These classes give great possibles to define how a entry should look like. However, one thing is missing for our HTML rendering. In HTML-Tags can have options like special CSS classes or direct CSS commands. To solve this we create our one HtmlTag that is inherited form the normal Tag:

class HtmlTag(Tag):
	def __init__(self, name, opt, *args):
		super(HtmlTag,self).__init__(name, *args)
		self.options = opt
	def render(self, backend):
		text = super(Tag, self).render(backend)
		try:
			return backend.format_tag(self.name, text, self.options)
		except TypeError:
			return backend.format_tag(self.name, text)

In line 1, the new class is defined with the super class Tag. In line 2, the initialization starts. It gets a name and *args like a normal Tag as input but also gets a opt argument for the options. In line 3, the initialization of Tag with name and *args is called using the super function. After this, the extra opt argument is saved in self.options (line 4).

Every Rich text object has a render function that is called with the backend to render the right representation. So we need to overwrite these also to get the options to the backend after line 5. In line 6, the super function is used to render the text. These is necessary because a Tag is a container so all sub Rich texts must be render first. Then, the rendering can be send to the backend and the result is returned (line 8). It gets a name and the text as input like a normal Tag but also gives the options as input. However, not every backend supports such a rendering. So it can be that these function creates a TypeError because they expect that format_tag have only two arguments. This should not break the rendering. So the Exception handling is used to switch back to the normal Tag rendering in such cases. The statement is placed in a try block (line 7). After this, a except block is created that catches the TypeError (line 9) and return then the rendering without the options (line 10).

With this HtmlTag we can know create our own style that create prettier output. The Styles has many functions like:

  • format_article
  • format_book
  • format_inbook
  • format_inproceedings

and more. The different functions are called for different types of bibtex entries. We only use the format_article function. All entries in our example are article. However, the other types should not break the complete process so we again inherit from a existing style. We use again the unsrt Style as super class. The result should have the form:

<div>
<h4>*title*</h4>
<i>*authors*</i><br>
*journal*<br>
<a href="https://doi.org/*doi*">[ Publishers's page ]</a>
</div>

 To get this result the class looks like as followes:

class HtmlStyle(Style):
	def format_article(self, context):
		ret = Text()
		ret += HtmlTag("h4","style=\"margin-bottom: 2px;\"", context.rich_fields['title'])
		ret += Tag("i",context.rich_fields['author']) + Symbol('newblock')
		ret += context.rich_fields['journal']
		if 'volume' in context.fields:
			ret += Symbol("nbsp") + context.rich_fields['volume']
		if 'number' in context.fields:
			ret += Symbol("nbsp") + "(" + context.rich_fields['number'] + ")"
		if 'pages' in context.fields:
			ret = ret + ":" + context.rich_fields['pages']
		if 'doi' in context.fields:
			ret += Symbol('newblock') + HRef('https://doi.org/' + context.fields['doi'],"[ Publishers's page ]")
		return HtmlTag("div","class=\"" + context.fields['year'] +  " mix \"",ret)

The format_article function gets as input a context. This context has the same information like the corresponding entry in the variable fields (for example line 7). However, the same information is also given as Rich text in the variable rich_fields (for example line 4). In the cases where strings are needed the fields variable is used and where Rich text is needed the rich_fields variable is used.

In line 3, the return container is initialized as empty Text(). After this, new content is add at the end of this container.  In line 4, the title line as h4 is added to the content. Here, the HtmlTag is used directly. The HtmlTag get the options style that change the margin to the bottom. The authors are added in line 5. Here, the authors are warpped into a i-tag befor they are added to the text. After this, a newblock Symbol is added which stands for a linebreak. In line 6, the journal title is added simply as Text. After this some optional journal information are given (volume, number and pages). If they exists they should be add to the Text. In line 8, the volume is added. In front of it a other Symbol is placed. The nbsp Symbol stands for a non-breaking space. In line 10, the number is added. Here, "(" and ")" are added as normal Strings. They are automatically converted from the Text by the adding operation in a Rich-String. In line 12, the pages are added. Here, not "ret +=" is used but "ret = ret +". These look on the first glimpse as equivalent but the evaluation order is not the same. In the second case, first, the ret + ":" is evaluated. That means that the add functions of the Rich text is evaluaeted first. In the other case, the ":" + context.rich_fields['pages'] is evaluated first which triggers the add function of the standard String and creates an error. In line 14, the doi is added as HRef where the link is given as standard String. In the last line, the enclosing div is created as HtmlTag. Here, the options are classes: the year as number and "mix".  The later is used later in the tutorial.

The new main looks as fallows:

if __name__ == "__main__":
	db = DB()
	orcs = [OrcID(*t) for t in db.getList()]
	db.close()
	alldocs = []
	api = API()
	for orc in orcs:
		alldocs += [d for d in api.getWorks(orc) if orc.start <= d.date <= orc.stop]
	alldocs.sort()
	uniqdocs = [doc for doc,_ in itertools.groupby(alldocs)]
	bib = BibliographyData()
	for d in uniqdocs:
		joinBibliography (bib,parse_string(api.getWork(d),"bibtex"))
	style = HtmlStyle()
	style.sort = lambda x: sorted(x, key = lambda e:-int(e.fields['year']))
	formatbib = style.format_bibliography(bib)
	back = Backend()
	back.write_to_file(formatbib,"out.html")

The only difference to the previous main function are in line 14 and 15. In line 14, the new HtmlStyle is used instead of Style. In line 15, then the sort function is overwritten that sort all entries before they are rendered. The entries are sorted after the negative int value of the year member. Such that the sort order is reversed

The result can be downloaded here and looks now much prettier:

However, some parts do not fit to our expectation: the line breaks, the formatting of the title and the numbers in every entry. All three things have different sources in the Backend. So we can get rid of these by implementing our own HtmlBackend. Of course we only want change the things that are not in our favor, so the class will inherit form the normal html Backend.

The three things that we must change are:

  1. The interpretation of Symbols
  2. The use of the HtmlTag
  3. How an entry is written
  4. The enclosing html

The last one is not necessary but make things simpler at the end.

The result class looks as follows:

class HtmlBackend(Backend):
	symbols = {'ndash': u'&ndash;', 'newblock': u'<br/>\n', 'nbsp': u'&nbsp;'}
	format_tag = lambda self, tag, text, options =None: u'<{0} {2} >{1}</{0}>'.format(tag, text, options if options else "") if text else u''
	label = None
	def write_entry(self, key, label, text):
		if label != self.label:
			self.output(u'<h3 class=\"{0} year\">{0}</h3>\n'.format(label))
			self.label = label
		self.output(u'%s\n' % text)
	write_epilogue = lambda self: self.output(u'</div></body></html>\n')
	prologue = u"""<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
		<html>
		<head><meta name="generator" content="Pybtex">
		<meta http-equiv="Content-Type" content="text/html; charset=%s">
		<title>Bibliography</title>
		{HEAD}
		</head>
		<body>
		{BODY}
		<div id="content">
		"""
	def prepout (self, head, body):
		self.prologue = self.prologue.format(HEAD = head, BODY = body)
	def write_prologue(self):
		try:
			self.prepout("","")
		except ValueError:
			pass
		self.output(self.prologue % (self.encoding or pybtex.io.get_default_encoding()))

In line 2, a class variable symbols is set. The used dict has for every Symbol a entry that assign the corresponding rendering to it. The newblock entry is assigned to <br/> so that the line breaks work.

The next feature is that the HtmlTag really work. The normal Backend does not render the options field. This is done in line 3 where the format_tag function is overwritten with an optional argument options. If it is not given, it places an empty string as options by string formatting and conditional expression.

In line 4 to 9 the write_entry is overwritten. Thise is the function that is called for every entry. Here we can get rid of the numbers that are rendered before the entries. The function gets as arguments: a citation key, a label and a text. The citation key is not used from us anywhere. The label is the number that is given. However, we will change this in the main function so that this is the year of the entry. The last argument is then the entry as rendered text. We want that every time a new year is reached this year is printed as <h3>. So we must save in self what the last year was. This is done in self.label. In line 4, this is set to None because no year was rendered until now. In line 6, it is checked if a new label (year) is reached with this entry. If this is the case, a <h3> is output in line 7 and these label is saved as the last output label in line 8. Note that here the function self.output is used to create the output. This is a function of the Backend that writes the output to the file. In line 9, the rendered text is written with self.output.

The last feature is the enclosing html. Here, two functions are interesting: write_prologue and write_epilogue. Like the names suggest write_prologue is called before the entries are written and write_epilogue is called after the entries are written. The later, in line 10, is strait forward: Close the enclosing div, the body and the complete html. It is a simple lambda function. The more complex case is the write_prologue function because the complete head of the html file are written here. In line 11-21, the prologue is prepared as class variable. This is done as a Triple quoted string. In that way, the string goes over multiple lines. The string is then in the write_prologue function (line 24-29) rendered with self.output. In the prepared string is a "%s" (line 14) that is replaced in line 29 with the right encoding of the html file. The string also contains a "{HEAD}" and a "{BODY}". These are place holder of extra head and body content that can be added. For them a simple prepout function is given at line 22-23. However, if these function is not called they are replaced with empty strings. To ensure this, the function is called in write_prologue with empty strings (line 26). If the prepout function is called before it produce an error because "{HEAD}" and "{BODY}" no longer exists in the string. So a try (line 25-26) and a except block (line 27-28) are used to catch this ValueError and do nothing.

The main function only need a small change:

if __name__ == "__main__":
	db = DB()
	orcs = [OrcID(*t) for t in db.getList()]
	db.close()
	alldocs = []
	api = API()
	for orc in orcs:
		alldocs += [d for d in api.getWorks(orc) if orc.start <= d.date <= orc.stop]
	alldocs.sort()
	uniqdocs = [doc for doc,_ in itertools.groupby(alldocs)]
	bib = BibliographyData()
	for d in uniqdocs:
		joinBibliography (bib,parse_string(api.getWork(d),"bibtex"))
	style = HtmlStyle()
	style.sort = lambda x: sorted(x, key = lambda e:-int(e.fields['year']))
	style.format_labels =  lambda x: [int(e.fields['year']) for e in x]
	formatbib = style.format_bibliography(bib)
	back = HtmlBackend()
	back.write_to_file(formatbib,"out.html")

In line 17, now the HtmlBackend and not Backend is used. The rest is the same.

The result can be downloaded here and looks as fallows:

This is looks exactly how we want it. So we have created our output.

The complete result of this section can be downloaded here:


Filter HTML Content with Javascript

The filtering with javascript is possible but a pain  without any framework. So we use here a framework:

The jQuery framework provides functions that are much easier to use. For the tutorial, jQuery version 3.2.1 is used.

You need to download the .js file and put them beside your out.html. You need the jquery-3.2.1.min.js file.

When you downloaded the files we can start and change the content of our html file to work with them. For this we can use the extra body and head that we have created in our HtmlBackend. So we start by creating a simple head that imports the javascript file:

head = """
<script src="/jquery-3.2.1.min.js"></script>
<script type="text/javascript">
//empty
</script>
"""

There is also an empty inline javascript block that is field later.

We also create a simple body that creates a search input field that we then can use to filter the data:

body = """
<div class="cd-filter-content">
	<input type="search" placeholder="Try unicorn">
</div>
"""

We can now add a line to the main function to add them to the out.html:

if __name__ == "__main__":
	db = DB()
	orcs = [OrcID(*t) for t in db.getList()]
	db.close()
	alldocs = []
	api = API()
	for orc in orcs:
		alldocs += [d for d in api.getWorks(orc) if orc.start <= d.date <= orc.stop]
	alldocs.sort()
	uniqdocs = [doc for doc,_ in itertools.groupby(alldocs)]
	bib = BibliographyData()
	for d in uniqdocs:
		joinBibliography (bib,parse_string(api.getWork(d),"bibtex"))
	style = HtmlStyle()
	style.sort = lambda x: sorted(x, key = lambda e:-int(e.fields['year']))
	style.format_labels =  lambda x: [int(e.fields['year']) for e in x]
	formatbib = style.format_bibliography(bib)
	back = HtmlBackend()
	back.prepout(head, body)
	back.write_to_file(formatbib,"out.html")

Line 19 is added and add the head and body to the file.

The result looks the same with the difference that there is now an function less search input on the top. It can be downloaded here:

To add functionality to the search input, javascript needs to be written. All javascript code that is written is added to the head string in the inline javascript block. So everything that you see here must be insert there to check the functionality by yourself.

The first step binds an event to the input field. JavaScript  knows different events. Here, we want to use the keyup event. That means, every time a key is pressed and released, these event is called. To bind these event, we use the jQuery function keyup(). It gets as input an handler, i.e. a function that is called when the event is triggered. Here, we use an anonymous function that simple create an alert with the content of the search input. However, to make sure that the binding works, we must wait until the html is completely loaded and then bind the event. This can be done by the jQuery function ready(). We use the simplified form: $(function(){...}). Here is the complete javascript:

$(function(){
	search = $(".filter-input input[type='search']")
	search.keyup(function(){
		inputText = search.val().toLowerCase()
		alert(inputText)
	})
})

In line 2, we create a jQuery object called search that contain the input field. To identify these field. we search for an object of class "filter-input" and that is an input html tag of type "search". These is only true for our search input field so that we have an exact description of it. In line 3, the keyup event is bound. In line 4, the content is loaded from search and saved in inputText. The content is the value of the input and can be retrieved with the jQuery function val().  The resulting string is then transformed to lower case with a standard JavaSrcipt String function toLowerCase(). In line 5; this inputText is used to create an alert.

The result can be downloaded here.

The functionality is not very useful. We do not want to create alert, we want to filter the content with the input. We add these functionality in the following:

$(function(){
	search = $(".filter-input input[type='search']")
	search.keyup(function(){
		inputText = search.val().toLowerCase()
		$('.mix').each(function() {
			if($(this).text().toLowerCase().match(inputText) ) {
				$(this).show()
			}
			else {
				$(this).hide()
			}
		});
	})
})

In line 5, the jQuery each function is called that, for every element with the class "mix", calls a anonymous function. In these function, the element, for that it is called, is saved in the variable this.  So there function check in line 6, if the text have a match with the inputText. The jQuery function text() returns the combined text content of the element which contains: the title, the authors, and the journal. To make sure that the case of the characters do not matter these string is also changed toLowerCase. If they match, this element is marked as shown (line 7) with the jQuery function show(). If these is not the case, this element should be hidden. This is done in line 10 with the jQuery function hide.

The result can be downloaded here.

We have a working filter that hide entries that not contain the search input. However, that only applies for the bibliography entries. The years that are between them are not filtered. So that when the corresponding entries are hidden, they are displayed with no content under them. We want also to hide these years. This can be done with a simple jQuery each call in the right time:

$(function(){
	search = $(".filter-input input[type='search']")
	search.keyup(function(){
		inputText = search.val().toLowerCase()
		$('.mix').each(function() {
			if($(this).text().toLowerCase().match(inputText) ) {
				$(this).show()
			}
			else {
				$(this).hide()
			}
		});
		$('.year').each(function() {
			if ($("."+$(this).text()+ ".mix").is(":visible")) $(this).show()
			else $(this).hide()
		});
	})
})

The new javascript is in line 13-16 directly after the each function for the entries is finished. It again uses the jQuery function each to apply a function to every year element (line 13). These elements can be identified with the class "year". The function use the jQuery function is to check if any of the entries with this year is ":visible" (line 14). The entries with this year can be found because they have the year as class and also the class mix. If there is any entire left, then the year is shown (line 14), otherwise it is hidden (line 15).

The result can be downloaded here.

The complete result of this section can be downloaded here:


Conclusion

If you have done the complete tutorial, you have seen and used some interested python syntax that helps you with you own developments.

Maybe you are only here because you are interested in the ORCID-API. If these is the case, you have seen some basic functions and many links to read further. However, you also have an idea how to interact with it in python. That gives you an advance, if you want to write you own project also in python.

If you are here for any other topic that is part of the tutorial. I hope that you have get an idea how to go on with you problem.

If you have any further questions fell free to ask me:

Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!

However, the result is a functional application. That used ORCID as basic. Of course, this means that your researcher needs to keep there ORCID page updated. However, these should not be the big problem because ORCID is getting more and more famous and used. For example, many publisher already need you to give a ORCID to publish with them.

The system have some open problems:

  • How to insert new authors?
  • What to do with authors that are now active in your group and so have no end date?
  • What to do with paper that are not in bibtex format?
  • What to do with paper that miss data?
  • The title could be a little different in the same paper?
  • The information of the same paper may be different from different source ORCIDs.
  • The filtering by the year would be nice.
  • etc.

There are things left to do. If you interesting in using somthing like this on a productive level look at the GitHub repository. Here are some of the issus are adressed. If you also intrestet in doing something to bring these project further you can do this on GitHub:

ORC-Schlange on Github