Presentation of the project
CORVAM is an online corpus comprising of audio files compiled during fieldwork carried out in various localities in the Maghreb, several text files dealing with linguistic and historical analyses and photographic material. The purpose of this corpus is to preserve the highest possible number of audio data in order to reflect the vast dialect variation in contemporary Maghreb thus contributing to the preservation of its oral heritage.
From a scientific point of view CORVAM constitutes a useful database for producing synchronic descriptions, comparative studies or socio-linguistic analyses of Arabic and Berber varieties in the Maghreb. Furthermore, the contents of the texts will also be relevant for anthropologic and ethnographic works.
The process of creating a speech corpus is lengthy and involves two types of elements: informants who kindly collaborate and researchers who compile and analyse data.
Investigation work conducted in the creation of CORVAM was deployed in several stages: the first phase of any dialectological investigation consists of the previous documentation regarding the chosen dialect in order to select the most significant issues to be studied and to plan the next phase (establishing the typology of the variety, considering other languages present in the area, etc.); the second phase consists of fieldwork in the chosen zone (finding suitable informants, making initial contacts and recordings, compiling specific information for completing paradigms, etc.); the third phase of the work involves transcribing and translating the corpus, and the fourth and final phase consists of investigating the material gathered in order to interpret it, compare it to other dialects and produce relevant conclusions.
Presentation of data
Data are presented by means of an interactive map wherefrom each of the analysed localities may be accessed. From each locality two types of data may be found. On the one hand, linguistic data provides a brief description of the dialect’s main features and bibliographic references on existing works as well as an audio file (as minimum), its broad phonetic transcription, translation (to Spanish, English or French) and a metadata file regarding the sociolinguistic profile of informants as well as other recording facts. On the other hand, it was deemed convenient to add historical information on each locality as its analysis may also contribute to understanding the diachronic evolution of the varieties under study. A photo gallery was also added.
In some cases, audio files consist of a popular song recorded in a studio which was considered relevant due to the use of the dialect and because it forms part of the region’s oral heritage. In this case, additional information on the singers is provided.
Informants must be natives of the chosen locality and have lived most of their lives there. Any age, gender and level of education is suitable. Their sociolinguistic profile is described in the metadata file. Informants remain anonymous for data protection purposes. The name assigned in each file is fictitious or incomplete to prevent identification.
All documents published in the CORVAM must have the previous consent of the informants. Researchers who compiled audio data ensure this consent is appropriately given and will be responsible for any implications regarding publications on the website.
Content and methodology
Document contents are very varied. No topic restrictions apply save those which may be offensive or liable to create danger for the informants or the researcher.
Information compiled consists of conversations with the informant prompted by the researcher and, in some cases, may have taken place amongst groups of relatives or friends. The directed interview method is used, consisting of asking the interviewee about ordinary situations with the interviewer’s smallest possible intervention and the fewest possible interruptions so as to allow speech to flow naturally. When the situation has required a more accurate information, the elicitation method also has been used. This contributes to fluent and spontaneous speech though the persons involved are aware of being recorded. In cases where the interview is long, the most representative fragment is chosen in terms of voice clarity, sound quality, spontaneity of the speech, etc.
The researcher preferably explains to the informant the purpose of the interview so that the informant may be actively involved and take part in the investigation work. Nonetheless, as is well known, this is not always possible.
It is very important to ascertain from the beginning that the interview adapts to the specific Arabic or Berber variety investigated to prevent inaccurate identification of linguistic traits from a different variety.
The researchers are responsible for their transcriptions and translations into the chosen language and that recordings meet the variety they are assigned.
Concerning the transcription of audio files a broad phonetic transcription (where the most salient phonetic features are noted) has been chosen, though in some texts can be a phonemic transcription (taking into account some allophones).
We have decided to distinguish vowel quantity on account of pedagogical reasons, even when the difference between long and short vowels is not always perceptible, above all in some Moroccan varieties.
Audio files used are mp3. Written documents must use Unicode format (preferably Doulos-SIL).