The candidate will develop a software application that performs the following tasks:
Extract data from provided structured sources (e.g., CSV, JSON files) and unstructured sources (e.g., company reports, press releases in PDF or HTML format). Data Processing: Clean, normalize, and structure the extracted data for analysis. This may involve handling inconsistencies, missing values, and transforming unstructured data into a structured format. Data Analysis: Implement basic analysis on the processed data to derive insights related to public companies. This could include calculating financial ratios, sentiment analysis on management discussion sections of annual reports, or identifying trends in the data. Output Generation: Produce a summary report or dashboard that presents the analysis results, highlighting key insights and trends identified from the data.
Source code for the application developed, including clear comments and documentation. A README file that provides instructions on how to set up and run the application, including any dependencies that need to be installed. A brief report or dashboard (can be a simple web page, PDF, or a structured text document) summarizing the key findings from the data analysis.
The application should be developed using programming languages and tools appropriate for data processing and analysis (e.g., Python, R, JavaScript). Use of libraries or frameworks for data extraction (e.g., Pandas, BeautifulSoup, PyPDF2), data analysis (e.g., NumPy, SciPy), and data visualization (e.g., Matplotlib, Plotly) is encouraged. The application should handle both structured and unstructured data sources, demonstrating the candidate's ability to work with diverse data formats.
Candidates are advised not to spend more than 2-3 hours on this exercise. It is understood that this time constraint may limit the scope of what can be achieved. Therefore, candidates should focus on demonstrating their approach to solving the problem, their ability to write clean and efficient code, and their skill in extracting meaningful insights from the data within the allotted time.
- Code Quality: Clarity, structure, and documentation of the code.
- Data Processing Skills: Effectiveness in handling and transforming both structured and unstructured data.
- Analytical Ability: Capability to derive meaningful insights from the data.
- Creativity and Innovation: Use of innovative approaches to extract, process, and analyze the data.