You may be trying to access this site from a secured browser on the server. Please enable scripts and reload this page.
Turn on more accessible mode
Turn off more accessible mode
Skip Ribbon Commands
Skip to main content
Turn off Animations
Turn on Animations
Leiden Research Data Service Catalog
About this website
All information sheets
Phases in the research project
Research data Life Cycle
Standard Evaluation Protocol
General Data Protection Regulation
The Language Archive
Edit this information sheet
All information sheets
The Language Archive (TLA) is a unit of the Max Planck Institute for Psycholinguistics concerned with digital language resources and tools. It is a large data archive holding resources on languages worldwide.
Max Planck Institute
Type of service
Usage and appreciation
Large number of partners, see:
There is a contact form (
) and a support forum (
Stage in the research project
Postion within the research process
3. Analysing data, 4. Preserving data
3. Shared Research Domain, 5. Public Domain
Type of data
2. Data collections and structured databases
5. Store, 6. Acces, use and re-use
Klasse 3 voor openbare informatie, Klasse 2 voor interne informatie [voor een beperkte groep]
Three funding bodies: BBAW, KNAW and MPG
The depositors themselves are responsible for compliance with any legal regulations in the area where the data is collected. Where required by national regulations the archive also signs contracts with national/regional institutions. All ethical issues are dealt with by using Codes of Conduct, such as the DOBES Code of Conduct for the DOBES part of the archive. The repository enables the depositors to restrict access to their resources at various levels. All distributed copies elsewhere are stored under the agreement that they are made available under the same access restrictions, if they are made available.
the depositor decides on access permissions; a code of conduct is available
Two copies of every resource are stored within the MPI and at least 4 additional copies are stored in different physical locations in Germany. The storage hardware is being replaced at regular intervals to the latest state of the art. Regular checks are performed on archival content to check for file and format integrity. The Sun SAM-FS HSM system that is being used for storage also checks for file integrity upon file access. The repository will have 2 identical archive access setups at the backup sites in Göttingen and Munich, so that in case of an emergency the data can be accessed via one of these sites.
TLA requires the right to archive, but does not claim copyright
Data curation strategy
Widely based on open standards; regular quality assessment vai Data Seal of Approval
Primary target group
Secondary target group
Classification of the service
Web interface 24/7 ; APIs
In order to be better able to support the proper handling of these moral and/or juridical rights, TLA has implemented four levels of access:
1. Open resources can be accessed immediately.
2. Restricted open resources can be accessed by registered users which possibly (as in the case of DOBES) have to agree with a Code of Conduct.
3. In addition to the conditions that hold for restricted resources, protected resources can be accessed on request only. The responsibles (usually the depositors) will examine the request and, if they grant access, they may do so for a specific use or limited amount of time, which may have to be agreed upon in a usage declaration.
4. Some sensitive “closed” resources can be accessed only by the depositors (and, e.g., members of the respective speech community).
The repository in principle makes the original deposited objects available in an unmodified way, if the objects were in one of the accepted file types and encodings. Additionally, lower quality distribution copies of audio and video recordings are made available. New versions of archived resources can be deposited, in which case the old versions will be moved to a version archive. Different versions of the same resource are not compared; we assume the depositor has good reasons for depositing a newer version. A new version of a resource will get a new persistent identifier; the old version will keep the original persistent identifier. Metadata can change if the depositor or archivist sees the need for that, in the case of errors or missing information. Changes to the metadata are currently not logged. All archived objects are linked to their metadata descriptions and are organized in hierarchical (or multi-rooted) tree structures to indicate relationships between objects and sets of objects. The tree structures can change if the depositors decide that this is necessary. The identities of the depositors are checked by means of a login and password when they deposit material online. Provenance metadata as to who made changes to the repository is currently only stored in log files and not shown to the data consumer.
Data sets can be embargoed
Accepted metadata formats
Accepted content types
Text, Audio, Video, Images
Accepted preferred formats
Archived resources preferentially make use of UNICODE; XML; generic models such as ISO LMF, ISOcat DCIF; RDF; XML-EAF; IMDI/CMDI; MPEG 2/1/4; mJPEG2000; JPEG/TIFF/ PNG; 48 kHz-16 bit linear PCM;
Accepteerde file formats
Maximum size of deposits
No information was found
Quality assurance is the responsibility of the depositor
Metadata is openly available. Log in is required to download data files.
Scholars affiliated with federated organisations may log in using their institutional account. Others can create a guest account on the registration page.
Tools / Interfaces for access
The repository provides various ways of utilizing the archived data via online tools as well as by downloading the data in formats commonly used by the research communities. An advanced metadata search utility is provided, as well as a deep search tool for textual content. All metadata can be harvested via the OAI-PMH protocol. Unique persistent identifiers according to the Handle system are provided for each archived object.
Long term guarantees
MD5 checksums are calculated for all objects and checked periodically. The availability of files on the file system is checked automatically daily. The availability of the archive access tools is checked automatically multiple times a day. The availability of file, web and application servers is monitored continuously. New versions of archived resources can be deposited, in which case the old versions will be moved to a version archive. In the future these old versions will also be made available to the end users but this is currently not yet the case.
Complies with international standards for trusted repositories
TLA has an explicit mission to archive language resources from all around the world, both collected by associated researchers as well as researchers who are not affiliated with the federated organisations. The mission is upported by the official possibility to store full copies at two computer centers at different locations for which the president of the Max Planck Society gives an institutional backing of 50 years of bit-stream preservation. working on duplicating the archive access framework in those backup locations as well, such that access to the data can be provided even if our institute would cease to exist.
No information was found
We urge all researchers, institutions and individuals that are in the possession of such data: please do seriously consider the need for a long-term preservation plan in order to assure that these data will be available for future generations.
Agreements with Leiden University
Not necessary; researchers from applicable fields may deposit their linguistics data