Day 5 of New Day New Leaning
New Day New Learning #linkedin #cfbr

Day 5 of New Day New Leaning

Data Analysis Process:

  1. Why data collection is important and how company perform this task?

  • Data Collection is important because it provides the information that's needed to answer questions, analyze business performance or other outcomes, and predict future trends, and scenarios. It is used to perform strategic planning research and other projects.
  • In business, data collection occurs on multiple levels. Companies conduct surveys and track social media to get feedback from customers. Let's take the example of Instagram, instagram algorithm is to engage more people in the content they like or refer more content on the same niche to create better engagement, and also through this they get more clicks and actions on the platform which directly helps the shareholders and teams to analyze things and maintain their focus on those platforms using that type of content. Data Scientists and analysts then collect relevant data to analyze from internal systems, plus external data sources if needed.

Fews ways to collect customer data.

  • Center of Gravity: Business Partnerships, IOT Eco-systems, Mobile apps, Operational systems, Social media, Multimedia, and Financial transactions

2. What are the different methods of data collection?

  • Data can be collected from one or more sources as needed to provide the information that's being sought. The methods that are used to collect data based on types of application caries differently. The following are some common data collection methods:
  • 1. automated data collection functions built into business applications, websites, and mobile apps. 2. IOT-based collection of data. 3. Tracking Social media, discussion forums, review sites, blogs, and other online channels. 4. Focus groups on one interview. 5. Direct observation of candidates in a research study.

3. What are common challenges faced in data collection?

  • Finding relevant data: Due to the wide range of systems to navigate, gathering data to analyze can be a complicated task for data scientists and other users in an organization. Data curation techniques help to make it easier to find and access data. (data curation means creating, organizing, and maintaining data sets and easy accessibility.)
  • Data quality issues: Raw data always contains errors, inconsistency, and whatnot. That collection takes some of the measures to ensure the data quality is maintained as per the requirement. As a result, collected data usually needs to be put through data profiling to identify issues and data cleansing to fix them.
  • Deciding the correct data to be collected: This is the most fundamental issue both for the upfront collection of raw data and when users gather data for analytics applications. Collecting data includes time, cost, and complexity of the process. And having a small data set also lags in making better decisions.
  • Dealing with big data: Big data environments typically include a combination of structured, unstructured, and semi-structured data, in very large volumes. That makes the initial data collection and processing stages more complex. In addition, data scientists often need to filter sets of raw data stored in data lakes for specific analytics applications.

4. Difference between data lake and data warehouse.

  • Warehouses are more secure and easier to use but more costly and less agile. Data Lakes are flexible and less expensive, but they require expert interpretation and lack the same level of security.
  • Data warehouse examples: Snowflake, Google BigQuery, Amazon Redshift, Azure Synapse Analytics, IBM Db2 Warehouse, Firebolt.

  • Data lakes can include structured data from relational databases (rows and columns), semi_structured data (CSV, logs XML, JSON), unstructured data (emails, documents, PDFs)

5. What is a Database management System and it's types?

  • These are software systems used to store, retrieve, and run queries on data. A DBMS serves as an interface between an end-user and database, allowing users to create, read, update, and delete data in the database.
  • Types: Relational database (contains primary key, candidate key), Object-oriented database (inheritance, data encapsulation, polymorphism), Hierarchical database (child and parent nodes), Network database ( 1:1 or many to many relationships).

Sietse-Arne Schelpe

AI developer - Online Marketing Specialist Founder/COO @ wetime | Phyton, PHP, SEO, SEA, Affiliatie, AI developer, specialized in creating unique models and datasets

9 个月

Wow, congratulations on your commitment to continuous learning! It's great to see that you're diving into the fascinating world of data analysis. Keep up the amazing work! #lifelonglearner #dataanalysis #growthmindset

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了