An analyst needs to conduct a quick analysis. Which of the following is the FIRST step the analyst should perform with the data?
Correct Answer:
A
The first step the analyst should perform with the data is to conduct an exploratory analysis and use descriptive statistics. Exploratory analysis is a type of analysis that aims to summarize the main characteristics of the data, identify patterns, outliers, and relationships, and generate hypotheses for further investigation. Descriptive statistics are numerical measures that describe the central tendency, variability, and distribution of the data, such as mean, median, mode, standard deviation, range, quartiles, etc. Exploratory analysis and descriptive statistics can help the analyst gain a better understanding of the data and its quality, as well as prepare the data for further analysis.
An analyst has been asked to validate data quality. Which of the following are the BEST reasons to validate data for quality control purposes? (Choose two.)
Correct Answer:
B
Integrity and D. Consistency. This is because integrity and consistency are two of the best reasons to validate data for quality control purposes, which means to check and ensure that the data is accurate, complete, reliable, and usable for the intended analysis or purpose. By validating data for integrity and consistency, the analyst can prevent or correct any errors or issues in the data that could affect the validity or reliability of the analysis or the results. Here is what integrity and consistency mean in terms of data quality:
✑ Integrity refers to the completeness and validity of the data, which means that the data has no missing, incomplete, or invalid values that could compromise its meaning or usefulness. For example, validating data for integrity could involve checking for null values, outliers, or incorrect data types in the data set.
✑ Consistency refers to the uniformity and standardization of the data, which means
that the data follows a common format, structure, or rule across different sources or systems. For example, validating data for consistency could involve checking for spelling, punctuation, or capitalization errors in the data set.
The other reasons are not the best reasons to validate data for quality control purposes. Here is why:
✑ Retention refers to the storage and preservation of the data, which means that the data is kept and maintained in a secure and accessible way for future use or reference. Retention does not need to be validated for quality control purposes, because it does not affect the accuracy or reliability of the data itself.
✑ Transmission refers to the transfer and exchange of the data, which means that
the data is moved or shared between different sources or systems in a fast and efficient way. Transmission does not need to be validated for quality control purposes, because it does not affect the completeness or validity of the data itself.
✑ Encryption refers to the protection and security of the data, which means that the
data is encoded or scrambled in a way that prevents unauthorized access or use. Encryption does not need to be validated for quality control purposes, because it does not affect the uniformity or standardization of the data itself.
✑ Deletion refers to the removal and disposal of the data, which means that the data
is erased or destroyed in a way that prevents recovery or retrieval. Deletion does not need to be validated for quality control purposes, because it does not affect the meaning or usefulness of the data itself.
Which of the following is a common data analytics tool that is also used as an interpreted, high-level, general-purpose programming language?
Correct Answer:
D
The option that is a common data analytics tool that is also used as an interpreted, high-level, general-purpose programming language is Python. Python is a popular and versatile programming language that can be used for various purposes, such as web development, software development, automation, machine learning, and data analysis. Python has many features and libraries that make it suitable for data analytics, such as its simple syntax, dynamic typing, multiple paradigms, built-in data structures, NumPy, pandas, matplotlib, scikit-learn, etc. The other options are not programming languages, but software applications or platforms that are used for data analytics or related tasks. SAS is a software suite that provides advanced analytics, business intelligence, data management, and predictive analytics capabilities. Microsoft Power BI is a business analytics service that provides interactive visualizations and business intelligence capabilities. IBM SPSS is a software package that offers statistical analysis, data mining, text analytics, and predictive analytics capabilities. Reference: Python For Data Analysis - DataCamp
Samantha needs to share a list of her organization's top 50 customers with the VP of sales.
She would like to include the name of the customer, the business they represent, their contact information, and their total sales over the past year.
The VP does not have any specialized analytics skills or software but would like to make some personal notes on the dataset.
What would be the best tool for Samantha to use to share this information?
Correct Answer:
B
Microsoft Excel.
This scenario presents a very simple use case where the business leader needs a dataset in an easy-to-access form and will not be performing any detailed analysis.
A simple spreadsheet, such as Microsoft Excel, would be the best tool for this job. There is no need to use a statistical analysis package, such as SAS or Minitab, as this would likely confuse the VP without adding any value. The same is true of an integrated analytics suite, such as Power BI.
A data analyst needs to present the results of an online marketing campaign to the marketing manager. The manager wants to see the most important KPIs and measure the return on marketing investment. Which of the following should the data analyst use to BEST communicate this information to the manager?
Correct Answer:
D
The option that the data analyst should use to best communicate the information to the manager is a summary with statistics, conclusions, and recommendations from the data analyst. A summary is a concise and clear way of presenting the main findings and insights from the data analysis report. A summary should include relevant statistics that support the conclusions and recommendations from the data analyst. A summary should also highlight the most important KPIs and measure the return on marketing investment in relation to the objectives of the online marketing campaign. The other options are not as effective as using a summary to communicate the information to the manager, as they either provide too much or too little information or do not address the manager??s needs or expectations. A real-time monitor may provide too much information that can be overwhelming or distracting for the manager who wants to see only the most important KPIs and measure the return on marketing investment. A self-service dashboard may provide too little information that can be insufficient or unclear for the manager who wants to see some guidance and interpretation from the data analyst. A spreadsheet of raw data may provide irrelevant or inaccurate information that can be confusing or misleading for the manager who wants to see some analysis and insights from the data analyst. Reference: [How to Write an Executive Summary for Your Data Analysis Report - Towards Data Science]