Data Science Projects

1. Fraudulent claims detection using Natural Language Processing

 About: Improved fraud detection models for an insurance company in the US, by utilizing text analytics using the claim notes. Used Natural Language Processing (NLP) techniques such as topic modeling, named-entity extraction to identify key features which were used in machine learning models such as XgBoost, Random Forest models. Significant improvement in capturing the fraudulent cases against traditional methods.

 Skills: Topic Modeling, Word Vectorization, Python, PySpark, Databricks.

2. Credit scoring and loan loss reserve models for Auto Finance Company

About:  Developed statistical models to predict credit defaults and develop credit scoring systems. Analyzed historical borrower data, including financial behaviors and payment patterns, to identify key indicators of default. Applied techniques such as logistic regression and decision trees to create accurate credit scoring models, enabling informed lending decisions, improved risk assessment, and optimal interest rate setting.

Skills: Regression techniques, SAS, Excel, Presentation, Github

3. Demand Forecasting for Fortune 500 beverage company using time series models 

 About:  Developed demand forecasting and market mix modeling for one of the world’s largest alcoholic beverage companies. Analyzed historical sales, weather patterns, macroeconomic trends, and marketing data to predict demand accurately. 

 Skills: ARIMA, VAR, R, Excel, Presentation

4. Marketing Vehicle Optimization using Mixed Effect Models

About: Utilized mixed-effects models to perform market mix analysis, identifying the most effective marketing vehicles and optimizing return on investment (ROI). These efforts enhanced forecasting precision and informed strategic marketing decisions for improved business outcomes. Developed python based model implementations pipeline which automated the sales data acquisition, model scoring, ROI calculations, etc.

Skills: Random & Fixed Effect Models, Python, Excel, Github

5. Digital Transformation for largest Auto Finance company in India

About: Spearheaded a project for a prominent auto finance company to overhaul their data infrastructure. We replicated their existing database system and constructed both a data warehouse and a data lake using S3. The robust architecture allowed for seamless integration of large volumes of structured and unstructured data. Additionally, I worked as a subject matter expert to automate their reporting process by developing dynamic dashboards using Qlik Sense. These helped improve operational efficiency, reduced time spent on manual reporting, and helped the organization make data-driven decision making.

Skills: Visualization, Database Management, S3, Redshift, SQL, Qlik Sense

6. Automate data extraction from PDF using Python

About: At the World Bank, I worked on a project to automate the extraction of trade data from PDFs using Python by developing a custom script. The script leveraged Optical Character Recognition (OCR) techniques, specifically utilizing libraries like Tesseract, to process and extract text from non-digital PDFs. The extracted data was then parsed and transformed into a structured format, which was systematically stored in a relational database, ensuring efficient data retrieval and further analysis. The automation significantly reduced manual effort and improved data accuracy, providing a scalable solution for large volumes of document processing.

Skills: NLP, Web Scraping, OCR, Python

Research Projects:

1. Understanding Subjective Well Being & Relative Income

About: At the World Bank, I am currently engaged in a research project at the World Bank focusing on the econometric analysis of the relationship between subjective well-being and relative income across a global dataset. The analysis employs household-level survey data spanning approximately 160 countries and multiple time periods. I am utilizing ordered probit models to rigorously estimate the impact of relative income on reported well-being outcomes, accounting for ordinal dependent variables and heteroscedasticity. The models are designed to control for various socioeconomic covariates, allowing for robust cross-country and temporal comparisons.

Skills: Research, Literature Review, Survey, Panel Data Regression, STATA, Python

2. Improving our understanding of illegal opioid supply networks 

About: As a Research Assistant at Heinz College, Carnegie Mellon University, I contributed significantly to a National Science Foundation-funded project aimed at understanding the complexities of illegal opioid supply networks. My work involved leveraging advanced data analysis techniques, including Natural Language Processing (NLP) and geo-spatial mapping, to extract and analyze price information from extensive web data using custom-built regex algorithms and topic modeling. I also designed innovative metrics to quantify the diffusion speeds of fentanyl, methamphetamine, and heroin across the United States, and developed GIS-animated maps to visually depict these patterns over time. This research is in the final phase of submission to a peer-reviewed journal, where I am a co-author, reflecting my active role in contributing to scholarly work that addresses critical public health issues through rigorous, data-driven analysis.

Skills: Research, Probability, NLP, GIS

3. Corporate Governance & Corporate Social Responsibility of Indian Companies

About: I participated in a research project in which I contributed to data collection and analysis. The project culminated in the publication of a book that recognized my contributions. This book examines the theoretical and empirical aspects of the interplay between corporate governance and corporate social responsibility (CSR) practices among Indian companies. It offers an in-depth analysis of the evolution of CSR and its relationship with corporate governance.

Skills: Research, Literature Review, Excel

Link: https://www.amazon.com/Corporate-Governance-Responsibility-Companies-Sustainability/dp/9811092842

Academic Projects:

1. Created a GIS based story map on food insecurity in D.C.

About: Developed a GIS story for my class assignments. The story presents the findings from our research on food insecurity in D.C.

Link: https://storymaps.arcgis.com/stories/3fe822c600734b0aa4fe6258f00b48db

Skills: Research, GIS, Visualization, Story Telling

2. Created a GIS based informational dashboard on SNAP participation, Transportation, and Groceries

About: Developed a GIS dashboard for my capstone project. The dashboard is used by our clients to educate the DC Council members on food security conditions and challenges in the DC area.

Link: https://carnegiemellon.maps.arcgis.com/apps/dashboards/ee13a35986aa4e38a265e6f868c37ed3

Skills: GIS, Data analysis, Visualization.