Ethics and Open Science

In order to engage with Open Science practices, you must consider some ethical issues that apply to your work. There are three key considerations:

  1. Being clear with participants at the time of collection about what you plan to use the research data for

Make sure that your participant information sheet and consent form inform participants that their anonymised data will be made available via a publicly accessible online data repository. Participants should consent to their data being freely shared. Although legally GDPR only requires consent for sharing non-anonymised data, not obtaining consent for sharing any data breaches ethical guidelines. Therefore, language such as below should be included in the participant information sheet and consent form:

Information sheet

Because the data we collect from you may be of interest to other researchers, we will publish it on a publicly accessible online data repository. At that point, anyone will have access to your data. However, this only concerns fully anonymised information that cannot identify you, and we will only do this with your permission (please confirm this on the consent form).

Consent form

I understand that data from this study may be of interest to other researchers. I consent for my non-identifiable data to be shared through a publicly accessible online data repository. I know that even if I do not consent to this, I can still participate.

If the dataset contains any identifying information, you should also specify how participants can withdraw their consent for sharing this data at any time in the future:

I know I can opt out of this at any time in the future by contacting xxxx, quoting my personal ID number.

Do note that the example formulations above assume participants are given the option not to consent to their data being shared. Some research(ers) may make willingness to share anonymized data as a requirement for participation, for instance when the data is not sensitive.

Also note that researchers are encouraged to specify who can access the shared data and for what means (this is typically specified through the copyright associated with the data). In some cases, researchers may require users to sign an End User License Agreement (EULA) before given access to the data.

  1. Anonymity of data and protecting the identity of your participants

If data cannot be anonymized and shared, a solution can be sharing metadata (a summary of the data, such as group means, standard deviations, effect sizes, a table with zero-order correlations between all study variables, etc.). Metadata should be richly described and contain accurate and relevant attributes. Clear indication of who generated the metadata should also be included. Signposting for data that is available by request only may also be appropriate.

Data can be anonymous – with no identifiers – or confidential – with some information that can identify individuals which should be kept according to data protection legislation. All data should be kept according to the permissions we have sought and been granted by the participants or data custodians.

When data is confidential, such that it has some clearly identifiable information (e.g., addresses, names, postcodes, places, or other identifiers), much of this can be removed before data deposit or replaced with a unique identifier in the case of data that is linked over time. Where data requires anonymising before deposit, two individuals should independently inspect the dataset to remove any identifiable information or recode information in a standardised way as to minimise the risk of identification. Particular care should be given to the following:

  • Information including names, addresses, demographic information, place names, specific instances, dates of birth, etc.: remove or systematically recode.

  • Textual information which may refer to or identify information or people: consider appropriate recoding which keeps meaning without identifying a person or situation.

  • Numeric information which has a frequency of less than five which may identify an individual: consider collapsing categories down, depending on whether it is meaningful, or recode to a blank other.

For anonymous data that do not include any identifying information as standard, data should be checked for low frequency items which may need to be recoded as other or open questions should also be inspected for identifying information regarding self or others by one person before deposit.

Note that in lab-based research, data may be pseudonymous, which means that the data itself contain no identifiable information, but the researchers themselves may know the identity of the participants from their interactions with them.

  1. Respecting the wishes of those who take part in research

A commonly used phrasing is “Be as open as possible, as closed as necessary”. In some fields, it may be pertinent not to exclude participants if they would not like their information shared; this would typically involve sensitive data. In research with non-sensitive, anonymized data, it may be possible to make data sharing a requirement for participation.

For a more extensive treatment of ethical considerations and practical tips regarding data sharing, see Meyer (2018).

References

Meyer, M. N. (2018). Practical tips for ethical data sharing. Advances in Methods and Practices in Psychological Science, 1(1), 131–144. https://doi.org/10.1177/2515245917747656