A Framework for Collaborative Curation of Neuroscientific Literature

Large models of complex neuronal circuits require specifying numerous parameters, with values that often need to be extracted from the literature, a tedious and error-prone process. To help establishing shareable curated corpora of annotations, we have developed a literature curation framework comprising an annotation format, a Python API (NeuroAnnotation Toolbox; NAT), and a user-friendly graphical interface (NeuroCurator). This framework allows the systematic annotation of relevant statements and model parameters. The context of the annotated content is made explicit in a standard way by associating it with ontological terms (e.g., species, cell types, brain regions). The exact position of the annotated content within a document is specified by the starting character of the annotated text, or the number of the figure, the equation, or the table, depending on the context. Alternatively, the provenance of parameters can also be specified by bounding boxes. Parameter types are linked to curated experimental values so that they can be systematically integrated into models. We demonstrate the use of this approach by releasing a corpus describing different modeling parameters associated with thalamo-cortical circuitry. The proposed framework supports a rigorous management of large sets of parameters, solving common difficulties in their traceability. Further, it allows easier classification of literature information and more efficient and systematic integration of such information into models and analyses.

This service is used to import new publications to the server, to get a local copy of the server-side PDF and text versions, and to visualize annotations in their contexts.
Obviously, accessing the features provided by the RESTful service requires Internet access. To minimize this dependency, once the access rights of a user to a publication has been verified (see section 1.3), corresponding server-side files are saved locally so that no network connectivity is required anymore to work on that publication.

Verification of publication access rights
When a publication PDF is imported to the server, it is first parsed to create a plain-text version using the pdftotext command-line program. If the resulting document is less than 2 KB, it contains virtually no text. Likely, such a PDF is a scanned version on which no optical character recognition (OCR) has been performed. In that case, OCR is performed on the document by the server using the Tesseract Open Source OCR Engine.
Users have to send their own copy of the publication PDF to be able to download from the server the corresponding reference documents (PDF and text). When the server receives a user PDF, it computes its MD5 hash and compares it to MD5 hashes previously associated with this publication. If a match is found, the server-side documents are sent to the client to serve as localization keys. Otherwise, the user PDF is converted to a text version using the same process previously described for importing publications to the server. The generated text file is compared against the reference version and an index of similarity is computed between both texts. If this similarity index is high enough, the MD5 hash from the user PDF is associated with the corresponding publication and the serverside versions of the PDF and text files are sent to the client. If the similarity index is too low, the client is denied access to the requested server files since his/her right to access this publication could not be established. The concerned researcher can nevertheless still use the RESTful service to visualize annotations in their context since sharing small extract of copyrighted material generally present no legal issues.
This two steps procedure (i.e., first checking the MD5, then checking text similarity) has been implemented mainly to minimize the need to perform lengthy OCR on users' PDFs. Also, using only MD5 verification would not be sufficient since any modification to the PDF (e.g., adding a "sticky note" comment) would change its MD5 hash and result in access denial. Supplementary Tables   Table S1. Example of parameter annotations.