ScaDS Logo

CENTER FOR
SCALABLE DATA ANALYTICS
AND ARTIFICIAL INTELLIGENCE

Static Publications Site-Tutorial (ORC-Schlange) - SQLite in Python

Beitragsseiten

SQLite in Python

The first step is to collect the ORCIDs of you researcher. We must only save the informational content. The "-" are the same for every ORCID so they have no information can be hiden. In our case we have three of them:

  • 0000000219094153
  • 000000020183570X
  • 0000000303977442

These three must be saved somewhere. It is the only part where the program interacts with a DB and it is done only in the backend. Thus there is no possibility to interact with this DB from outside.

We use a really simple SQLite DB with one table "people" that have three fields:

  1. orcid CHARACTER(16)
  2. start DATE
  3. end DATE

The first field is the ORCID of you researcher. The other two represent the periode of time the researcher belongs to your institute. Of course, people changes groups and so a publication that is written before or after there attendance in the institute should not be listed in your publicationlist. In our case, SQLite DB is a good idea due to the simplicity of the DB and the fact that the queries on the DB are not time critical.

To interact with SQLite python has a standard library: sqlite3. This library fulfills the python DB-API 2.0 standard. As a result, it can be replace with any other SQL DB interface and the commands work the same.

First a Connection to the DB is created. Afterwards, a Cursor is retrieved from this connection. The cursor can execute SQL commands. After all commands are executed, the Connection is used to save the data with a commit. At the end, the Connection is closed with the close command.

The following simple python script to fills the DB.

import sqlite3

conn = sqlite3.connect('people.db')
c = conn.cursor()

c.execute("CREATE TABLE people (orcid CHARACTER(16), start DATE, end DATE)")
c.execute("INSERT INTO people VALUES ('0000000219094153','1900-01-01','2016-12-31')")
c.execute("INSERT INTO people VALUES ('000000020183570X','1900-01-01','2016-12-31')")
c.execute("INSERT INTO people VALUES ('0000000303977442','1900-01-01','2016-12-31')")

conn.commit()

conn.close()

Line 3 and 4 creating the connection and the cursor. The file where the SQLite DB is saved is named "people.db". In Line 6 the table is created as described above. In line 7-9 the values of the three ORCIDs are insert. The three researchers stay from January 1, 1900 to December 31, 2016 at our fictional institute. The scripts is only run once. After that, this the data is saved in the people.db and can be read.

Now, a new script is written that contain all classes of this script and also the main function. These script is named __main__.py. These name is used in python to express that a main function of the package is contained in these file. The command "python ." can be used to start the script.

The first class represents a DB object that interacts with the SQLite DB. It has a initialization function, a function to get the data as list and a function to close the connection:

class DB:
	def __init__(self,path="people.db"):
		self.conn = connect(path)
		self.c = self.conn.cursor()
	def getList(self):
		self.c.execute('SELECT * FROM people')
		return self.c.fetchall()
	def close(self):
		self.conn.close()

In line 2 the initialization is startet with the __init__ keyword. The initialization has an optional parameter that contains the path to the DB file. The getList function in Line 6 uses SQL syntax to select all field and all entries from the table people. The DB-API 2.0 function fetchall fetches all entries and return theme as a list of tuples. These tuples contain all three fields as strings.The last function then close the Connection to the DB.

We create a simple main function that uses the DB class:

if __name__ == "__main__":
	db = DB()
	print(db.getList())
	db.close()

It should produce an output like this:

[('0000000219094153', '1900-01-01', '2016-12-31'), ('000000020183570X', '1900-01-01', '2016-12-31'), ('0000000303977442', '1900-01-01', '2016-12-31')]

The complete result of this section can be downloaded here: