COMP3425 and COMP8410 Data Mining S1 2024
Assignment 2: Description of Data
Data and Metadata
The data supplied for the assignment arises from The Australian Data Archive’s ANU Poll Dataverse [1]. As a student of the course, you are assumed to accept the Terms and Conditions of Use reproduced below. Please read them carefully. The custodian of the data has further requested you delete your data at the end of the course. However, you would be able to obtain another copy by request at the Website.
In particular, the data captures the results of a survey poll conducted in late 2023 on the topic of the 14th October 2023 Australian Constitutional Referendum on the Aboriginal and Torres Strait Islander Voice to Parliament. You can find a complete description of the purpose of the poll and coding of the data (metadata) and also adescriptive summary of the poll results here:
https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/13NPGQ
The data is provided to you for the assignment. You have original dataset as downloaded from the ADA called 02_ANUPoll_57_CSV_100150_general.csv, in comma-separated-values format. This data is described by the metadata in 01_ANUPoll_57_DataDictionary_100150_general.xlsx and the corresponding question text in 01_ANUPoll_57_Questionnaire_100150.docx
If you are a COMP3425 (undergraduate) student, you are required to undertake some pre- processing steps as specified in the assignment specification. If you are COMP8410 (postgraduate) student you may choose your own preprocessing actions, but you may find that referring to the COMP3425 assignment specification will help you.
A Note on Data Types
Note that most of the data is either nominal or ordinal. Many ordinal variables include some marker values that are not ordinal, but indicate unordered categories as exceptions to the ordinal values. Be careful that you do not blindly handle those marker values as ordinal, and that you do not treat nominal data as ordinal without specifically justifying why you do so. Appropriate handling may depend on the mining methods you use.
You can translate a nominal variable that is, by default, loaded in Rattle as numeric, using Rattle’s “Transform” tab (Recode-> As Categoric). Alternatively you can use Excel prior to loading by following the example here:
For example, for nominal nominal p_state_sdc, the formula CONCATENATE("""",
<p_state_sdc>, """") is used. If the variable has empty cells that you want to map to the “0” nominal value, you can use the formula or CONCATENATE("""",
TEXT(<p_state_sdc>, "0"), """") . In both cases, replace the variable name, where we use <p_state_sdc> in these examples, by the Excel cell reference, such as FB2.
References
[1] Biddle, Nicholas; McAllister, Ian, 2023, "ANU Poll 57/Australian Constitutional Referendum Survey (ACRS) (October 2023): Aboriginal and Torres Strait Islander Voice to Parliament",
doi:10.26193/13NPGQ, ADA Dataverse, V4
Terms and Conditions of Use
This data has been distributed exclusively for students of COMP3425 and COMP8410 S1
2024 only. Data must be destroyed at the end of the course but maybe re-obtained by request to the Australian Data Archive.
Furthermore, from
https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/13NPGQ
Iacknowledge that:
1. Use of the material is restricted to use for analytical purposes and that this means that I can only use the material to produce information of an analytical nature. Examples of such uses are:
(a) the manipulation of data to produce means, correlations or other descriptive summary measures;
(b) the estimation of population characteristics from sample data;
(c) the use of data as input to mathematical models and for other types of analyses (e.g. factor analysis); and
(d) to provide graphical and pictorial representation of characteristics of the population or sub-sets of the population.
2. The material is not to be used for any non-analytical purposes, or for commercial or financial gain, without the express written permission of the Australian Data Archive. Examples of non-analytical purposes are:
(a) transmitting or allowing access to the data in part or whole to any other person / Department / Organisation not a party to this undertaking; and
(b) attempting to match unit record data in whole or in part with any other information for the purposes of attempting to identify individuals.
3. Outputs (such as statistics, tables and graphs) obtained from analysis of these data may be further disseminated provided that I:
(a) acknowledge both the original depositors and the Australian Data Archive;
(b) acknowledge another archive where the data file is made available through the Australian Data Archive by another archive; and
(c) declare that those who carried out the original analysis and collection of the data bear no responsibility for the further analysis or interpretation of it.
4. Use of the material is solely at my risk and I indemnify the Australian Data Archive and its host institution, The Australian National University.
5. The Australian Data Archive and its host institution, The Australian National University, shall not beheld liable for any breach of this undertaking.
6. The Australian Data Archive and its host institution, The Australian National University, shall not beheld responsible for the accuracy and completeness of the material supplied.
7. Once access has been granted to the data, abuses of access rights, breaches of this undertaking, or failure to keep the data safe, may result in the application of restrictions.
Restrictions will escalate in severity depending upon the seriousness of the breach and vary from termination of access to the user and/or institution, on either a temporary or permanent basis, through to potential legal action in the most extreme cases.
8. I will notify promptly the ADA of any non-compliance with these Terms and Conditions of Use or of any infringements of the data, including unintentional disclosure or any errors within the data of which I become aware.
9. At the conclusion of my research notify of use. This may include the offer of publication to the ADA any new dataset that has been derived from the materials supplied or which have been created by the combination of the material supplied with other available data. The deposit of the derived dataset will include sufficient supporting documentation to enable the new dataset to be made accessible to other users.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。