Managing and improving European sampling practice
Obtaining good probability samples is a key challenge for European cross-national studies in order to represent the population. This report gives an overview of the sampling frames which are used in countries participating in the four cross-European surveys cooperating in SERISS: the European Social Survey (ESS), the European Values Study (EVS), the Gender and Generations Program (GGP), and the Survey of Health, Ageing, and Retirement in Europe (SHARE). The overview will show where possibilities exist to jointly build and share sampling frames and where studies not using an existing population register can profit from the experience of other studies which do have access to such a register in the same country. It provides a valuable knowledge database of national sampling procedures and accessible population registers across Europe and in addition offers a way to improve harmonization of sampling frames and sample data across European surveys. The report is accompanied by an Excel file which provides a full listing of the registers used by the four surveys across 24 countries.
This report addresses the quality of the population registers which are currently being used as sampling frames in countries participating in the four cross-European surveys cooperating in SERISS: the European Social Survey (ESS), the European Values Study (EVS), the Generations and Gender Programme (GGP), and the Survey of Health, Ageing, and Retirement in Europe (SHARE). It summarizes what efforts have been undertaken by register authorities to improve and update the registers and presents an inventory of the main problems encountered in the field by survey sampling experts. In addition, it discusses the quality of alternative methods of sampling and possible improvements.
In an attempt to improve the quality of sampling for European survey infrastructures by obtaining access to population registers in all countries, a letter was developed to be sent by survey infrastructures to relevant register authorities, making the case for survey access. The letter is a template that can be used by national teams and national funders or sent jointly by all directors of the four cross-European surveys cooperating in SERISS to exert more pressure. This report describes how the letter was developed, and how arguments were found to which register authorities are responsive. The report also explains how the letter can be used in practice.
Learning from administrative data
The potential for using auxiliary or contextual data for sample-based nonresponse adjustments recently gained more attention. However, identifying and accessing auxiliary data for nonresponse analysis, especially data which is of sufficiently high quality, presents a challenge. Data availability and access conditions can vary across countries and across organisations. This deliverable provides an inventory of auxiliary data that are available in registers used as sampling frames in the four major cross-European social surveys participating in SERISS: SHARE, ESS, GGP and EVS. Information is based primarily on findings from an expert survey among the researchers of these studies’ country teams. Findings are augmented with information on auxiliary data sources on the European level. In addition to this summary report, an accompanying Excel file provides detailed information about the auxiliary data available in registers used by the four surveys across 24 countries. This resource provides an opportunity to compare and learn from the experiences of the four major cross-national surveys being conducted in Europe today and identify potential sources of auxiliary data for future use.
D2.6 – Joint workshop with country representatives of SHARE and ESS to discuss available data sources and potential for subsequent use
The goal of this deliverable was to facilitate exchange between country representatives of ESS and SHARE by hosting a joint workshop. Ten survey researchers met for an interactive one-day workshop to discuss obstacles, opportunities for synergies by envisioning joint sampling as well as the potential of auxiliary data and their use for efficient sampling and non-response analyses.
This report provides an overview of different types of auxiliary data and their potential for nonresponse bias adjustment. Therefore, a scoping study with partly simulated data from Germany was conducted in which selected auxiliary variables were examined regarding their usability in nonresponse bias adjustment. Two different adjustment methods – response propensity weighting and the bivariate probit model with sample selection – were applied and evaluated regarding their potential for bias reduction. The second part of the report focuses on findings about different (cross-national) data sources of auxiliary variables.
Weighting for complex survey designs
This report reviews two broad classes of weighting methods to compensate for nonresponse errors in sample surveys: the nonresponse calibration approach and the propensity score approach. We show first, that arbitrary choices of the distance function characterizing the calibration methodology correspond to assuming, at least implicitly, alternative parametric models or the nonresponse process. As a natural extension of the nonresponse calibration approach, we then introduce the propensity score approach that allows us to improve the robustness of the survey weights by estimating an explicit model for the nonresponse process. Since the choice between these two approaches is not always clear-cut, we also consider a two-step procedure which involves a calibration adjustment in the first step and a propensity score adjustment in the second stage. Finally, the report concludes with a discussion on the important role played by the auxiliary information which is available to compensate for nonresponse errors.
This report describes a database containing necessary information for computation of population margins, used to compute calibrated weights in the first six waves of the Survey on Health, Ageing and Retirement in Europe (SHARE). It includes an overview of the database, documentation, data sources and a detailed description of the dataset. It also comprises a simple Stata code that provides an example of how to use the database containing necessary information for computation of population margins. A Stata file exported to Excel contains the dataset with necessary information for computation of population margins. A separate Stata code with a simple example on how to use the database containing necessary information for computation of population margins is available on request.
This report describes the Stata programs available to create calibrated weights based on data from the first wave of Survey on Health, Ageing and Retirement in Europe (SHARE). Since the basic units of analysis can be either individuals or households, computation of both calibrated cross-sectional individual weights for inference to the target population of individuals and calibrated cross-sectional household weights for inference to the target population of households are illustrated.
This report describes the Stata programs that are provided to users of the Survey of Health, Ageing and Retirement in Europe (SHARE) in order to compute their own calibrated longitudinal weights. Since the basic units of analysis can be either individuals or households, the computation of both calibrated longitudinal individual weights for inference to the target population of individuals and calibrated longitudinal household weights for inference to the target population of households are included. It also includes a description of the standards in the documentation of the sampling process and the resulting gross samples that can be seen as a blueprint for large-scale cross-national surveys. To guarantee consistent and uniform sampling quality that meets the high standards in SHARE, it is required to document the specific details in a Sample Design Form (SDF).
Handling item nonresponse
This report describes the flexible algorithm used to produce multiple hot-deck imputations (MHDI) of the non-monetary variables in the first six waves of the Survey on Health, Ageing and Retirement in Europe (SHARE), excluding the retrospective wave in 2008. After agreeing on the rationale and the key logical steps of the SHARE algorithm, the report’s authors focus on its implementation in the statistical software Stata (version 13.1) by providing a detailed description of the programs used for handling a number of issues in the construction of the SHARE public-use imputation dataset.
The imputed data set for SHARE Wave 7 is available as part of the actual data release 7.0.0 from the SHARE website.
This report describes the flexible Stata code used to create multiple imputations (MI) of the missing monetary variables collected in the Survey of Health, Ageing and Retirement in Europe (SHARE). Unlike missing data on other types of variables that are imputed by the hot-deck approach (see SERISS deliverable D2.12), missing monetary amounts on open-ended questions about income, wealth and consumption expenditure items are imputed by fully conditional specification (FCS) method proposed by van Buuren et al. (1999) and Raghunathan et al. (2001). The main advantage is that all monetary variables are imputed jointly to preserve their correlation structure.
Including the institutional population
The main objective of this report is to provide an overview of how the GGP has sought to adapt the SHARE imputation procedure to the Generations and Gender Survey (GGS) data from Belarus. The first conclusion is that the SHARE team have done an extensive and excellent job at providing high quality imputations for what is an exceptionally complex and economically focused questionnaire. The second conclusion is, however, that the resulting imputation approach is tailored to the nuances of the SHARE design and that there is no marginal added value in making the imputation procedure available to other infrastructures. An annex to this deliverable is the generic questionnaire of the GGP.
The large European social surveys usually exclude residents living in institutions from their samples. In 2016, the SERISS project started to investigate the possible consequences of this exclusion. The project examines the feasibility to sample and survey the institutionalized population. This report introduces the first release of an inventory of surveys that include the institutional population and describes contents and sampling approaches of national and cross-national surveys that interviewed institutionalized respondents. Moreover, the report advances a detailed definition of institutions and the institutionalized population and briefly describes the quantitative size and statistical distinctiveness of this subgroup in European countries.
Many surveys exclude institutionalized populations – a deliberate decision on the basis of a cost-benefit analysis. This report focuses on some of the people who live in those institutions, the type of institution they live in, the availability of data to select them to be included in representative samples and their ability to be reached in practice. This new evidence base can be used by researchers to better assess the pros and cons of including the institutionalized population in their surveys.
Survey experiments to compare two translation approaches: the ‘stay close to the source’ approach & the adaptive approach
D3.1 – Standards for the implementation of the two survey translation approaches: the ‘stay close to the source’ approach & the adaptive approach
Under SERISS an experiment has been set up to test two different approaches to questionnaire translation: the so-called ‘ask-the-same-question’ approach, which is a rather close translation method and has been the basic rule in most of the multilingual surveys, will be tested against a different approach that allows a higher degree of adaptation, an approach that has so far not been applied by the major surveys out of the concern that comparability between the different language versions may be hampered by a presumably too large distance between the source and the target versions.
This experiment will be carried out in two languages: Estonian in Estonia and Slovene in Slovenia. In each of these countries, three translation teams will work on altogether 60 questionnaire items to be translated according to both methods from English into both languages, subdivided into three sets of 20 items each. An elaborate research design has been worked out in order to minimise team effects: each item will be translated three times into each of the languages, twice following one method and once following the other method. An important aspect in the research design was to minimise potential team effects. The resulting translations will be fielded on the CRONOS web panel (WP7) in late 2017.
D3.2 – Report on the translation process used in testing close vs. adaptive questionnaire translation approaches
An experiment testing two different questionnaire translation approaches was carried out in 2017: a close versus a more adaptive translation approach was administered in a series of translation sessions. The experiment was carried out in Estonian and Slovene, and the resulting translations fielded in the CRONOS online panel developed as part of SERISS. The set-up of the translations, the team composition as well as findings from translation sessions are presented and discussed in this report.
Feasibility of applying computational linguistic methods to survey translation
D3.5 – First SERISS Symposium on synergies between survey translation and developments in translation sciences. Minutes of the expert meeting
This report summarises the discussion of the expert meeting organized as part of the “1st SERISS Symposium on synergies between survey translation and developments in translation sciences” which took place on 1-2 June 2017 at Universitat Pompeu Fabra (UPF) in Barcelona. The expert meeting brought together 15 experts from the public and private sectors in the fields of comparative survey methodology, computational linguistics, and translation and language sciences to discuss how current advances in computational linguistics could potentially improve survey translation.
This report investigates the suitability of tools used in computational linguistics to assess whether these tools can be useful for survey translation. The report reviews the main tools on offer and focuses specifically on machine translation (MT), and on common translation technologies that have contributed towards MT. The report then reviews these applications in the context of survey translation, highlighting areas where opportunity may exist to incorporate tools to assess translated survey questionnaires.
This report summarises the presentations and discussion of the SERISS seminar ‘Establishing a roadmap for the use of computational linguistics’ tools and translation technologies in survey research’. This seminar took place on March 14 and 15, 2018 at Universitat Pompeu Fabra (UPF) in Barcelona. Presentations delivered at the meeting can be downloaded through the Survey Experts Network section.
D3.8 – Computational and corpus linguistic tools in questionnaire translation: guidelines to adopt and test for innovations in survey translation
This deliverable includes guidelines to adopt and test for innovations in survey translation. Specifically, focusing on two areas of interplay between survey translation. Those areas are: 1) preparing survey items’ texts for Computational Linguistics (CL) methods and translation technologies by the creation of multilingual parallel corpora and 2) enhancing procedures in survey translation with functionalities commonly present in commercial translation technologies.
Comparative assessment of thesaurus key words
This deliverable describes two methods that were used to assess the translation quality of the European Language Social Science Thesaurus (ELSST). ELSST is a social science multilingual thesaurus that was developed to aid cross-language information retrieval of social science datasets, including cross-national survey datasets, in the Consortium of Social Science Data Archives (CESSDA) data portal. It is available in 12 languages, including English, the source language, from which target language versions are derived.
The first evaluation method is re-translation (referred to in this deliverable as ‘back-translation’), which is a standard evaluation method used for assessing the translation quality of thesauri and related vocabularies. A subset of French and German ELSST terms were back-translated, and results analysed to detect errors in the target language terms, including unintended ambiguity. The second evaluation method compares the set of ELSST index terms in all languages that were assigned to the same cross-national datasets by different CESSDA archives and associated archives. Differences in the sets of index terms assigned were analysed to see if they were due to differences in the interpretation of the terms by the indexers who assigned them.
These guidelines describe how to manage the content of the European Social Science Thesaurus (ELSST), following international standards and best practice. It is aimed primarily at ELSST translators and content developers, but will also be of interest to end-users. The guidelines take account of ISO 25964-1, the latest international standard on thesaurus construction, published by the International Organization for Standardization, as well as the work of other knowledge organisation experts and thesaurus developers.
The aim of this deliverable is to investigate the application of indexing terms in the data lifecycle. In particular, it investigates how and where to apply index terms to survey questions. The work describes an experiment to index survey questions drawn from three cross-national surveys: the European Social Survey (ESS), the European Values Study (EVS), and the Survey of Health, Ageing and Retirement in Europe (SHARE) using terms from the European Social Science Thesaurus (ELSST). A secondary aim of the deliverable is to investigate how well ELSST terms cover the subject content of the selected questions from the above three surveys. The ELSST terms used in the experiment were taken from the German, Greek, and Romanian versions of the thesaurus to index survey questions in the corresponding languages.
Updating the Translation Management Tool
Programmers from CentERdata, University of Tilburg have been working with several survey infrastructures to develop the Translation Management Tool (TMT), a web environment that centrally stores translations for large international multilingual studies. The TMT was originally developed for the Survey of Health, Ageing and Retirement in Europe. Under SERISS the TMT has been adapted to support other large scale studies that need to translate survey questionnaires including ESS, EVS and GGP. The program has been modularized for easier adaptation to other research infrastructures.
This document provides further information on the tool and details of how it can be configured for different surveys. A video demonstration of the tool and user guides for the ESS and EVS configurations of the TMT are available online.
In conjunction with WP8, an online environment – the surveycodings-backend – has been developed to enable researchers to centrally manage and store sets of classified strings to be used in the coding of open-ended survey questions. This may include sets of strings which have been translated into multiple language versions.
Question Design and Documentation Tool
The Questionnaire Design and Documentation Tool (QDDT) is intended to provide an interactive and dynamic web-based tool which can be used to both document and retrieve information on the complex process of designing a cross-national survey questionnaire. It should provide a more streamlined and user friendly replacement for the current paper-based word template used to document the European Social Survey’s questionnaire design process. Further information can be found on the QDDT Wiki.
This report provides information on the development of the QDDT developed for the European Social Survey and beta tested during the development of the ESS Round 9 questionnaire (2016-2018) prior to a large scale roll out for ESS Round 10. The report is accompanied by full technical documentation for the QDDT, a user guide and conventions document, a step-by-step worked example of how content is versioned and published within the QDDT and example PDF outputs from the documentation of the two Round 9 rotating modules.
Question Variable Database
D4.2 – Report on populating a Question Variable Database (QVDB) with questions and variables from the European Social Survey (plus further feasibility testing by other survey infrastructures)
The Question Variable Data base (QVDB) is a system for storage and retrieval of survey questions and variables, facilitating reuse of their metadata and metadata components. The overall aim of the QVDB is to serve the European Social Survey (ESS) and other survey programmes in their work with specifying, documenting, versioning and disseminating survey data. The tool also aims to make the survey questions and variables findable, available, searchable and re-usable for the wider research community.
Testing the Translation Management Tool (TMT) in ESS Round 8
D4.3 – Testing specifications for real-time testing of Translation Management Tool (TMT) in ESS Round 8
The Translation Management Tool (TMT) is a web-based tool specially designed to allow translators to translate questionnaires without the burden of understanding complex routing and programming codes for large multi-lingual questionnaires. Developed by CentERdata (University of Tilburg), it has been used in several large international studies including the Survey for Health, Ageing and Retirement in Europe (SHARE). Under SERISS the TMT has been extended to accommodate European Social Survey (ESS) translation procedures which uses TRAPD (Translation – Review – Adjudication – Pretesting – Documentation) as well as verification and SQP coding. The extended TMT was tested by three countries when carrying out their translations for ESS Round 8 (2016/17). This deliverable provides details of how those involved in the translation process – particularly the national teams and the verifier, CApStAn – were briefed on how to use and test the TMT.
This document presents and discusses the findings from this real-time testing in the ESS in its preparations for round 8 in 2016. The following national teams carried out the real-time testing of the TMT in ESS round 8: Poland (into Polish), Lithuania (into Lithuanian and Russian) and Russia (into Russian). In addition to the findings from these three national teams, this report includes observations from the ESS translation expert (author of this document) as well as of cApStAn, an external service provider of linguistic verification of the questionnaire translations in the ESS.
Extending SHARE’s fieldwork management tools
SHARE has developed a set of tools – their Sample Distributor/Sample Management System (SD/SMS) – that are used to manage all fieldwork processes. These distribute addresses to interviewers, monitor their contact attempts, install the correct version of the questionnaire, give access to pre-loaded data, record fieldwork success, and transmit the data back to the agency and central coordination. This tool box is very complex since it has to accommodate many different fieldwork situations, different survey instruments according to respondent type, different languages, and varying levels of electronic hardware/software sophistication in different countries. Under SERISS SHARE are re-writing the tools to make the adaptation to different situations more transparent and easier to handle and facilitate their take-up by other surveys.
Researchers on the Survey of Health, Ageing and Retirement in Europe have been working with programmers from CentERdata, University of Tilburg to develop a new tablet-based Sample Management System (tablet SMS) to replace the current server/desktop/laptop version. The main reason for this update is that the current client-server based software (originating from 2002) is outdated. Replacing it by modern web-based applications will reduce maintenance cost, will make the software easier to set up and use (and provide it as a service), will open up the use of the software on modern devices like tablets and smartphones, and finally, through designing and developing the software in a more generic way, it opens up the possibility for other cross national survey projects like ESS, EVS and GGP to use the software. The prototype tablet SMS is available to view online.
This deliverable provides the industry style software manuals for the Sample CTRL and Case CTRL software developed by CentERdata for SHARE, EVS, ESS and GGP. CentERdata expanded the applicability of the tooling to newer Windows, Android and iOS operating systems to support handheld tablet and smart phones use. They also managed to improve the detailed fieldwork statistics on the household, respondent, interviewer, sample, and subsample level to be used for real-time fieldwork monitoring at the agencies, and adaptable to the special needs of SHARE, EVS, ESS, and GGP.
A Fieldwork Management System (FMS) for ESS
A key challenge facing social surveys, especially those operating cross-nationally, is to monitor and manage fieldwork effectively. As part of the SERISS project, a new electronic fieldwork management system (FMS) has been developed for the European Social Survey. The FMS consists of two components: a mobile app to be used by interviewers in the field to manage their caseloads and complete contact records on the doorstep and a centralised case management system (CCMS) to manage the exchange of data between the survey agency and interviewers and maintain a central database which can be used for fieldwork progress monitoring.
D4.7 – Fieldwork Management and Monitoring System (FMMS) for the European Social Survey: Report on the feasibility of cross-national implementation
The FMS offers clear potential benefits in terms of providing ESS stakeholders with access to consistent and timely data on fieldwork progress. However, there are implementation issues which need to be addressed before the FMS can be rolled out cross-nationally. This deliverable reports on the results of a consultation exercise carried out with ESS National Coordinators and survey agencies to identify potential legal, technological and organisational barriers to implementing the FMS on the ESS. Issues were identified around the need to transfer and store personal data across national borders, a lack of handheld mobile devices available within agencies, insufficient IT and end-user support in some countries, and a lack of support for a centralised monitoring tool among many ESS survey agencies, particularly those with their own well-developed in-house monitoring systems.
D4.8 – Fieldwork Management and Monitoring System (FMMS) for the European Social Survey: Report on test case scenarios
Researcher testing of the FMS was conducted in June 2016 to test how far the tool currently under development meets the business needs of the ESS and to detect missing or defective functionalities. Issues identified during this first round of testing were then addressed as far as possible by developers prior to retesting in September 2016. Feedback from researcher testing fed into a final version of the FMS prototype scheduled for finalisation and validation in early 2017.
D4.9 – Fieldwork Management and Monitoring System (FMMS) for the European Social Survey: Report on interviewer testing of the final mobile application and central database
Classroom based testing of the FMS app and data exchange with the CCMS took place with UK-based ESS interviewers in October 2016. Its purpose was to gather end-user feedback on whether the app’s workflow and user interface met interviewer needs.
Survey Project Management Portal: MyEVS
SMaP is a virtual collaborative work environment that supports data collection projects in the various phases of the project lifecycle. It combines project and data management tools that are customized to the roles and activities of the different stakeholders of a survey project. This guide for the demo version of the tool provides information about the test environment for interested survey projects.
D4.13 – Survey Project Management Portal (SMaP): Manual for researchers for using the project management platform
This user manual covers the key features and functions of a sample portal with a low level of restrictions, i.e. option of customisation of few features is allowed for users. A series of reality-based examples and respective explanations will guide portal users in their project tasks.
D4.14 – Survey Project Management Portal (SMaP): Manual for implementation and use for maintainers of the project
SMaP at its final shape represents a web application based on the open source project management software called eXo Platform Community Edition (CE), which is customized to most closely correspond to specific survey lifecycle requirements. The purpose of this documentation is to guide survey projects through the implementation of their own SMaP by giving instructions about the installation and configuration of the eXo Platform and the deployment process from eXo Platform to project specific SMaP.
Variable Harmonisation Hub
Cross-national and longitudinal data harmonisations and data linking are vital in realising the full reuse value of European-level data. Task 4.5 was to develop a new, online library of harmonisation routines where digitally documented harmonisation projects will be archived and accessed.
D4.15 – Online Library of Harmonisations: Report on workflows for harmonisation submission and community peer-review processes
Having developed a new, online library of harmonisations where digitally documented projects will be archived and accessed, this deliverable provides a summary of the work that has been completed. Specifcially, it provides details on the user interface that will be developed for the tool and defines the workflow for the process of peer review harmonisations that will be submitted to the online library.
Building a survey network for Europe
The ‘SERISS Survey Experts Network’ is a series of workshops thematically based around SERISS work packages. The aim of the workshops is to bring together survey practitioners and researchers (e.g. representatives from national statistics institutes, cross-national European surveys, survey agencies and survey methodologists) in order to facilitate a productive exchange of knowledge and practices in state-of-art survey research, to initiate a discussion on how to tackle specific challenges in survey methodology and data harmonisation, and to encourage future cooperation between different organisations.
This report provides a summary of the first Survey Network workshop ‘Representing the population in surveys’ which draws on work done in SERISS Work Package 2 dealing with sampling approaches and challenges in cross-national surveys. The workshop took place in December 2016 and was hosted by Munich Center for the Economics of Aging (MEA). The main purpose of the first workshop was to review sampling practices across Europe, to exploit synergies to be gained from exchanging knowledge and to discuss possible cooperation in gaining better access to registers or sharing sampling frames across surveys.
The second Survey Experts Network Meeting took place at the University of Amsterdam in September 2017. The main purpose of the workshop was to bring together researchers, survey practitioners (e.g. cross-national survey infrastructures, commercial survey agencies, representatives of non-profit organisations conducting social surveys) and other stakeholders (e.g. national statistics institutes, employment agencies) involved in designing, coding and analysing socio-economic questions in order to demonstrate coding and harmonisation tools developed under SERISS, to offer the participants an opportunity to try out the tools during the workshop, and to provide feedback and suggestions for tool upgrades and training materials.
D5.11 – Survey Experts Network Meeting 3: Legal and ethical issues of combining survey data with new forms of data
Work Package 6 of the SERISS project addresses the major legal and ethical challenges facing cross-national social science research which relies on access to large-scale data on an individual level. There was no specific budget allocated for organising the workshops required for D6.1 and D6.2. These workshops were therefore organised in partnership with the SERISS Survey Experts Network with costs met out of the WP5 budget.
Social media data
The SERISS workshop ‘Legal and ethical issues of combining survey data with new forms of data’ took place on 19 June at City, University of London. This report gives a summary of the presentations relating to social media and notes key discussion points which relate to social media.
This report outlines guidelines on the use of social media data in survey research. The main purpose of this report is to provide guidance on the legal requirements and ethical challenges when using social media data in survey research. It is expected that research should safeguard principles of research ethics as well as legal conditions, and in many cases legal conditions overlap with principles of research ethics. These guidelines are structured as questions and corresponding answers that have been selected with legal and ethical issues in mind.
The SERISS workshop ‘Legal and ethical issues of combining survey data with new forms of data’ took place on 19 June at City, University of London. This report gives a summary of the presentations relating to administrative data and notes key discussion points which relate to administrative data.
This first deliverable of Task 6.3 (Connected curation and quality) focuses on the description of high level workflows for the curation of new and novel forms of data (NNfD). As these workflows provide the framework around which other elements of WP6 work will be benchmarked and evaluated, the text will be refined, revised, and expanded for subsequent deliverables. New and novel forms of data will lead to new curation challenges, not least around legal and ethical issues and the V’s of Big Data (Volume, Velocity, Variety, Veracity and Value), but existing methodologies and best practices will be used to address these issues wherever possible.
This deliverable addresses the version requirements for appraisal of, and access to, new and novel forms of data as part of Task 6.3 (Connected curation and quality). The report provides a broad analysis of the issues and challenges in designing and implementing a version management system and understanding the wider ecosystem of versioning approaches. These support the future application of versioning strategies to high level repository workflows.
This report provides contextual information relevant to the changes faced by data archives including the rapidly evolving academic and scientific ecosystems and then provides an overview of appraisal and selection practices and possible future approaches to addressing new and novel forms of data.
Modern research in the social sciences increasingly requires recording innovative variables such as objective health information. The inclusion of so-called biomarkers i.e. objective measures of biological and physical functions recently gained in importance for field surveys in terms of supplementing traditional self-reported survey data. However, while being able to complement population based survey data collection with biomarkers is of great value scientifically, the collection of these types of data is associated with legal and ethical challenges.
This deliverable addresses centrel legal requirements and ethical issues related to the collection of Dried Blood Spot (DBS) samples and provides a synopsis of policy-rules for collecting biomarkers in social surveys. By describing experiences when implementing the collection of biological samples in various European countries and Israel as part of the SHARE study, this deliverable also demonstrates the concrete legal and ethical challenges associated with the collection of DBS samples cross-nationally.
In recent years, there has been an upsurge in the use of biological specimens as objective health measurements in socio-economic surveys. High participation rates in the collection of biomarkers are desirable to enhance the statistical power by increasing the number of observations available for statistical analyses. Consent rates may depend on many factors. This report looks at the consent rates for the collection of dried blood spots (DBS) samples in the context of the sixth Wave of the Survey of Health, Ageing and Retirement in Europe (SHARE) and focusses on two of them: different legal or ethical requirements in the participating countries and the expectations of SHARE interviewers regarding the success of the DBS collection. The analyses of these factors in correlation with actual consent rates suggest that the interviewers’ expectations have been more important for the success of the blood collection in terms of higher consent rates than specific deviations regarding the collection process based on legal and ethical requirements. This is an interesting result for survey practitioners who are concerned with the integration of the collection of biomarkers in socio-economic surveys: whereas legal or ethical requirements cannot be changed easily, the expectations of the interviewers can be influenced by good interviewer training.
D6.12 – Report on Practical Strategies to Include Biological Samples in Population-Based Social Surveys
More and more social surveys are implementing the collection of more innovative variables. Complementing self-reported survey data with objective health measures helps to overcome typical biases in survey databases caused by individual perceptions and improves the precision of health measurement. Against this background, the inclusion of objective measures of biological and physical functions (i.e., so-called biomarkers based on physical measurements and biological specimen) and the linkage of survey data with administrative records gained in importance for population-based social surveys.This report assessed practical implementation.
The Cross-National Online Survey (CRONOS) panel designed and implemented as part of the SERISS project is the first attempt to establish a probability-based cross-national online panel following an input harmonisation approach. CRONOS is a pilot study to evaluate the effectiveness of panel recruitment off the back of an existing cross-national face-to-face survey (European Social Survey Round 8) in terms of costs, sample representativeness, participation rates, panel attrition over time and data quality. It serves as a ‘proof of concept’ from which to produce a blueprint for an online probability-based panel that a range of cross-sectional survey infrastructures could consider adopting in the future.
To inform the development of CRONOS seven existing general population web-based random probability panels were reviewed: LISS (Longitudinal Internet Studies for the Social sciences, The Netherlands), GIP (German Internet Panel, Germany), GESIS Panel (Germany), ELIPSS (Étude Longitudinale par Internet Pour les Sciences Sociales, France), NCP (Norwegian Citizen Panel, Norway), ATP (American Trends Panel, USA) and FFRISP (Face-to-Face Recruited Internet Survey Platform, USA). This report summarises the findings and recommendations from that review.
This report outlines the legal and ethical frameworks relating to the processing of personal data in cross-national web panels. It provide details about the CRONOS panel and the legal and ethical framework surrounding cross-national research in Europe. The report also explores specific issues relating to consent and confidentiality in CRONOS with reference to ethical guidelines, the current legislation and the new General Data Protection Regulation (GDPR).
One of the main goals of this pilot study is to explore the challenges associated with cross-national recruitment and implementation. This deliverable summarises the recruitment plans and decision process related to setting up CRONOS; these plans were informed by a literature review, practical and empirical evidence from similar projects, and cooperation with the numerous survey experts and organisations involved in the project.
Protocols designed for – and implemented during – the initial stages of the CRONOS panel are detailed in this deliverable. The strategies and documents presented in this deliverable are the result of the work of a large team of social scientists. This collaborative approach intended to find state-of-the-art methodological approaches that were suitable for implementation in all countries, so as to ensure as much methodological equivalence as possible.
This deliverable describes the costs incurred during CRONOS (CROss-National Online Panel) – the world’s first cross-national, input-harmonised, probability-based web panel. It examines the costs from three main perspectives: (a) the distribution of costs for one participating country, (b) the cost per interview minute (CPIM) for CRONOS and compare this with the CPIM for the parent survey (Round 8 of the European Social Survey), and (c) an estimation o the equivalent CPIM ratio for a potential second iteration of CRONOS (CRONOS 2) for lower/medium-wealth and higher-wealth countries.
This deliverable describes the content of online surveys carried out in the CRONOS. The panel has been set up to investigate the feasibility of building a cross-national online panel using probability-based samples recruited at the end of existing face-to-face surveys. A pilot study was set up in three countries: UK, Slovenia and Estonia in Round 8 of the ESS (2016).
This report describes a database system for management and administration implemented fore the CRONOS panel. It was created for tasks related to panel administration and management such as sample selection, e-mail distribution and fieldwork monitoring. The database includes contact information, administrative data, panel maintenance data and selected background variables from the ESS Round 8 face-to-face interview. In addition, it facilitates monitoring of panel recruitment, analyses of respondent behaviour and participation over time.
This document summarises the selection process of the web programming tool (Questback) and documents which additional functionalities and features were customised to meet CRONOS requirements for the web survey platform. It also provides programming guidelines used for the first CRONOS survey (Welcome survey) and screenshots of the source version of the Welcome survey.
This deliverable describes a quality assessment of the first data collection of the CRONOS panel. This quality assessment is intended to establish a data quality baseline to assess panellists’ response behaviour in later waves. To establish the data quality baseline, a number of indicators related to how respondents completed the ‘welcome survey’ of the panel are evaluated.
One of the biggest challenges for online panels is the balancing act between high data quality and representativeness. Non-participation, breakoffs, or losing participation over waves are a risk for the representativeness of the data, while missing answers or satisficing response behaviour endangers the data quality. This report describes an experiment conducted in waves 2, 4 and 6 of the CROss- National Online Survey (CRONOS) panel. The assessment is intended to establish the impact of motivational messages in web surveys. A number of indicators related to data quality, survey evaluation and completion time were evaluated.
This document provides an overview of the CRONOS panel data release, which took place on 16 May 2018. It includes information about the project user guide; screenshots of the data release; the web links; the undertaken dissemination activities; and provides acknowledgement of all of the researchers involved in producing the CRONOS data.
SERISS WP8 develops a cross-country harmonised, fast, high-quality and cost-effective coding module for the core variables: occupation, industry, employment status, educational attainment and field of education. The module uses a large multi-lingual dictionary with tens of thousands of entries about job titles, industry names, fields of education and training, and employment status categories. Additionally, the module will include country-specific, structured lists of educational qualifications. The module will provide up-to-date codes to classify the variables, using international standardised classification systems.
For further information see www.surveycodings.org.
Deliverable 3.13 – Report on multilingual coding platform – relates to this section.
SERISS has funded the development of APIs (Application Programming Interface) to allow for self-coding of socio-economic variables by survey respondents. Three variables – occupation, industry and educational attainment – are currently available as an API. This deliverable briefly describes how the APIs can be used in web surveys and what is needed to use them for computer-based face-to-face or telephone interviews.
This deliverable includes the XML software for the survey questions to produce a coding module for five core socio-economic variables: Occupation, industry, employment status, educational attainment and field of education. It also includes the code for those survey answers not available in an API.
Deliverable D8.14 summarizes all survey questions and answers for the five core variables and their translation into 47 languages. Deliverable D8.15 shows how the survey questions and answers appear in web survey mode.
The objective of this deliverable was to extend the existing, multilingual WageIndicator occupation database to 99 countries and to 5,000 occupational titles, all coded according to ISCO-08. It builds on preparatory work done for the InGRID project. This project facilitated the design for the development of the occupation database, accompanying this deli verable. The main output of this deliverable is a database of occupational titles, described in this report.
Current coding practices in multi-country surveys are mostly based on coding performed by a national survey agency and typically based on the national coding indexes. In most countries these indexes are prepared by the national statistical offices. For a multi-country validation of occupational coding activities of national survey agencies, the question arises whether the same occupational titles are coded into the same or in different ISCO-08 4-digit occupational units. For this deliverable as many coding indexes as could be found were collected, provided they used the ISCO-08 classification. The resulting merged validation database of coding indexes had 70,489 records of ISCO-08 5-digit occupational titles and their 4-digit code from 20 sources with 19 different languages. A check revealed that 10.3% of the records had codes that did not correspond with any codes in the official ISCO-08 classification.
For this deliverable a tasks database has been developed to measure work activities – called tasks – within occupations. The starting point for this database was the task descriptions provided by the ILO for occupations at the most detailed ISCO-08 4-digit occupational groups in its ISCO-08 coding index (2012). The database has been supplemented with task descriptions from national coding indexes and with translated task descriptions from the WageIndicator web survey on Salaries.
For this deliverable, partner Institute for Employment Research (IER) has checked the database created in D8.2 with the 4,000 occupational titles and collected, translated and validated the titles into the five largest languages outside the EU, notably Russian, Mandarin, Arabic, Hindi and Bahasa. The results of the disagreement check and the translations in the five languages are presented in this report. The database is available here.
In addition to respondents’ highest educational qualification, some surveys also collect data on their main field of education. Current measurement practice involves either a closed question with highly aggregated response categories, which are difficult to use for respondents, or an open question, requiring expensive post-coding. Therefore, a measurement tool for fields of education was developed. The database, including a live search feature, is available at the surveycodings website or via download.
Many questionnaires have a question “Please write the main business activity of the organisation where you work”. The answer is commonly asked as an open text field, challenging the survey holder to code the response into an industry classification. Alternatively, in web surveys respondents can self-identify their industry from a database.
This paper builds on previous work of the author for the industry question in the WageIndicator web survey (since 2001). For this survey a database and a search tree was developed, all coded according to the European Community NACE Statistical Classification of Economic Activities, commonly referred to as NACE Rev. 2.0. WageIndicator did not aim for a question with a short aggregated list, because of the related aggregation bias whereby respondents may classify their detailed industry into different aggregated categories. Therefore, a database of industry names and a two-level search tree was developed, offering 300 industry categories to the survey respondents. The database has English and national industry names. The national labels are shown to the survey respondent.
For deliverable D8.10 “Database of industries + explanatory note” the WageIndicator database was extended to 47 languages, suitable for use in 99 countries. The deliverable’s accompanying database consists of the industry codes, the English master label and the national labels for all 99 countries and 47 languages. Note that the codes correspond fully to the 3 or 4 digit codes of NACE2.0. For any database current Internet technologies allow for an API (Application Programming Interface) which means that the database can also be used offline, for example in tablets. From February 2017 the industry database will be available.
Many questionnaires have a question “Please write the main business activity of the organisation where you work”. Experience of the WageIndicator survey suggests respondents tend to skip the question about industry relatively more often compared to other questions, presumably because they judge answering the question as cognitively too demanding. An occupation>industry prediction tool has therefore been developed, providing survey respondents with a limited set of industries from which to pick based on their stated occupation. The report explains how the predictions were arrived at whilst the accompanying database gives the predictions derived for all 4 digit ISCO codes.
Employment status is a measure of an individual’s position in the labour market. The International Labour Organisation (ILO) maintains the International Classification by Status in Employment (ICSE). In 1993 ICSE was defined as a classification with six categories. In 2013, ILO scheduled to revise the classification, but its statisticians postponed final decision making until 2018. This paper develops survey questions for the measurement of ICSE-93. It then squeezes greater detail into the initial six categories, building on the suggestions proposed in 2013. The revised ICSE classification is a three-level classification of the six ICSE-93 groups at the first level, eight categories at the second and 13 at the third level. For the full classification 28 variables are needed to measure the revised ICSE. These are detailed in the paper.
This deliverable reports on survey questions designed to measure the revised ICSE classification. These survey questions and answers have been translated in 47 languages, facilitating the measurement of the ICSE classification in 99 countries. The deliverable provides the coding scheme and the syntax needed to convert the data from the survey questions into the revised ICSE classification.
Socio-economic status (SES) is a measure of an individual’s economic and social position. Over the past decades numerous studies have elaborated the measurement and its effect on a set of outcomes, with a predominant focus on the United States and the United Kingdom. In the early 2000s the European Socio-Economic Classification (ESeC) was developed. The 2008 revision of the ISCO occupational coding challenged the ESeC classification, and Eurostat called for an update, which was called the European Socio-Economic Groups (ESeG-2014). The ESeG-2014 classification is a two-level classification of nine groups and 42 subgroups, to ensure a quick and uncomplicated implementation in all statistical sources. Four variables are needed to measure ESeG-2014, notably the two core variables ISCO08 occupation and employment status (employee / self-employed), and two additional variables for people not in paid employment, notably status (retired / student / disabled) and age.
This report details survey questions designed to measure ESeG-2014 at a detailed two-digit level. These survey questions and answers have been translated in 47 languages, facilitating the measurement of the ESeG-2014 classification in 99 countries. With fewer survey questions the one-digit ESeG-2014 classification can be measured. The deliverable provides the coding scheme and the syntax needed to convert the data from the survey questions into the ESeG-2014 classification.
This report summarizes all survey questions and answers used to produce a coding module for five core socio-economic variables: Occupation, industry, employment status, educational attainment and field of education. For each variable separate deliverables detail the arguments why the questions and answers are phrased as proposed. This report has an accompanying database that contains the translations of all survey questions and answers detailed in this deliverable. Translations are provided for 99 countries in 47 languages.
Building on D8.14 this deliverable shows how the module to collect data on the five core socio-economic variables: Occupation, industry, employment status, educational attainment and field of education appears when programmed in a web survey.
This deliverable reports on a workshop which took place to introduce coding tools developed under SERISS to a wider group of researchers, survey practitioners (e.g. cross-national survey infrastructures, commercial survey agencies, representatives of non-profit organisations conducting social surveys) and other stakeholders (e.g. national statistics institutes, employment agencies) involved in designing, coding and analysing socio-economic questions, to offer them an opportunity to try out the tools during the workshop and to provide feedback and suggestions for tool upgrades and training materials.
Presentations from the workshop are included in an appendix.
Measuring social networks
D8.20 – Name generator and questionnaire items for Social Network Module
D8.21 – Translated name generator and questionnaire items
Social networks are the collection of personal ties that individuals variously maintain and from which they gain a range of benefits, supports and services. Given the significance of the social network construct for both science and policy, SHARE is developing a unique module for the measurement of social networks that can serve as a model for other surveys. The SHARE Social Network Module (SN) is based principally on the approach that was employed in the National Social life, Health and Aging Project, in the United States, in 2005-2006 (Cornwell et al., 2009). The module applies a name generating mechanism in which respondents identify the people who are important to them and then add information on each person named (via “name interpreter questions”). It also allows the tracing of changes in respondents’ social networks over time and is programmed to avoid respondents having to duplicate information provided.
D8.20 describes the basic structure of the name generator, as it is developing, and its mode of operation.
D8.21 showcases the production of the name generator tool in a wide range of languages that are spoken in Europe. The SHARE country teams have translated the SN module into the different national languages of the participating countries. This deliverable presents the generic questionnaire in English and the available translations from Austria, Belgium (French and Dutch), Switzerland (German, French and Italian), the Czech-Republic, Germany, Denmark, Spain, Spain Girona, France, Croatia, Italy, Luxembourg (German, French and Portuguese), Poland, Sweden and Slovenia.
Work on social networks conducted builds on initial work by SHARE and focuses on designing a ‘name generator’ for cross-national surveys (D8.21, above). A key aim was to construct and translate computerized questionnaire items that can be used in longitudinal and cross-sectional settings for all social surveys. This deliverable reports on subsequent work to generate a standardized classification of network types, based on the data collected.