How to use open data for analytics?
Data science has not only become a crucial tool for analyzing large volumes of information, but can also be used for social good. One promising area is the analysis of open data to identify anomalies in public procurement, such as the project to explore possible irregularities in the purchases of the municipality of Monterrey. In this context, we will learn how to search for appropriate data sources and face the challenges inherent in public data analysis.
Where to find the data for our project?
For projects based on government data, local government transparency pages are an excellent source of information. In the case of Monterrey, historical data can be accessed through the transparency section of its official website. Here are some additional resources for you to consider:
- Open Government: Explore sections dedicated to transparency and accountability.
- Open Government Data: Visit open data portals from different levels of government.
- GEGOL: Platform that provides resources focused on open data analysis.
Remember that when downloading data from these sources, you will probably face some challenges in terms of format and structure.
What are common challenges in handling government data?
Government data, such as that found in Excel, often comes with a number of difficulties that must be addressed to prepare a useful dataset for analysis. Some of the most common challenges include:
- Visual rather than tabular formatting: Many times the data present graphs and summaries that are not useful for analysis.
- Aggregation of explanatory headings: The inclusion of unnecessary headings can hinder the transformation of data into tabular structures.
- Unwanted additional information: Supplier data or other metrics that are not relevant to the purpose of the analysis.
Familiarity with these challenges will allow you to anticipate and develop strategies to handle them efficiently. An indispensable step is data set cleansing, which is estimated to consume up to 90% of the time in a data science project.
How do we prepare to meet the challenges of the data set?
First, it is important to explore and familiarize yourself with your dataset. Open the file in a data analysis tool, such as Excel or an open source platform, and carefully examine each sheet and category to identify the parts you won't need. In some cases, you will be working with data that is so large or complex that you will not be able to fully examine it in this way, but in more manageable projects, this initial review will give you a better idea of the steps needed to clean and structure your data.
Also, share your experiences and strategies in specific communities or forums. Not only will you receive valuable advice, but you will contribute to the growth of a collaborative learning environment.
In the end, remember that data cleaning and preparation is the foundation of rigorous and meaningful analysis, and although it is often time consuming, it is key to generating accurate and valuable insights. So go ahead, explore and don't be discouraged by these challenges! The ability to solve data problems will open up opportunities to uncover hidden stories that can have a significant impact on society.
Want to see more contributions, questions and answers from the community?