About Me

My name is Scott, and I love working with data.

Throughout my career as a water/wastewater engineer and consultant, I've cultivated a profound appreciation for the transformative potential of data. In my pursuit of continuous learning and growth, I've dedicated my free time to exploring innovative tools and resources for parsing, analyzing, and harnessing data effectively. This journey led me to share my insights and discoveries through blogging on Medium. Feel free to explore some of my projects showcased below!

Why Data Science?

My enthusiasm for data science stems from my commitment to leveraging data in impactful ways that have a real-world impact. Initially drawn to civil engineering by a desire to design solutions that enhance people's lives in mundane ways, my passion for this field remains strong. Data science presents an unparalleled opportunity to extend those solutions and fulfill my aspirations by:

Crafting systems that streamline daily tasks and enhance user experiences
Uncovering actionable insights that enable businesses to connect with their target audiences effectively
Creating visually compelling and informative data visualizations that drive informed, decision-making processes

RESUME

Project Portfolio

Hover over each project to see more details, including links to code and an associated blog article summarizing the project.

Urban Environmental Audio Classification Using Mel Spectrograms

card image

Urban Environmental Audio Classification Using Mel Spectrograms

Language: Python

Description: Using mel spectrograms generated from audio files included in the UrbanSound8K dataset, trained a CNN to classify urban sounds.

Skills: Audio Classification, Time/Frequency Domain Representations of Audio Data, Mel Spectrograms, Fourrier Transformations, Cross Validation, Convolutional Neural Networks (CNNs), Learning Rate Schedulers

Visualizations: Matplotlib, Seaborn, Librosa

Tools: NumPy, Pandas, Librosa, Scikit-learn, PIL, PyTorch

ARTICLE | CODE

Categorical Clustering of Pittsburgh Car Accidents Using K-Modes

Language: Python

Description: Implemented a categorical clustering algorithm (k-modes) to cluster data for car accidents occurring in Pittsburgh, PA, from 2010-2019. Conducted an EDA comparing each of the assigned clusters to one another.

Skills: Unsupervised Learning, Categorical Data Clustering (k-Modes), Elbow Method for Cluster # Selection, Chi-square Test, Exploratory Data Analysis

Visualizations: Matplotlib, Seaborn, GeoPandas

Tools: NumPy, Pandas, kmodes, SciPy, Scikit-learn

ARTICLE | CODE

San Francisco Crime Classification

Language: Python

Description: Developed models to predict the category of a crime based on its geographical location of occurrence using 12 years of historical crime data from San Francisco.

Skills: Multiclass Classification, Gradient-Boosting Algorithms, Random Forest, Fully Connected Neural Network, Clustering (Gaussian Mixture Models), Log-Odds Ratios, Word Embeddings (Word2Vec), Cross Validation, Ensembling

Visualizations: Matplotlib, Seaborn, Folium

Tools: NumPy, Pandas, Scikit-learn, PIL, LightGBM, XGBoost, CatBoost, TensorFlow, Keras

ARTICLE | CODE

Fine-Tuning Language Models for Sentiment Analysis

Language: Python

Description: Developed sentiment classification models by fine-tuning pre-trained language models (BERT, RoBERTa, DistilBERT) using financial news statements.

Skills: Pre-Trained Langauage Models, Transformers, Sentiment Classification, Fine-Tuning, Evaluation Metrics (Accuracy / Precision / Recall / F1 Score), Natural Language Processing

Visualizations: Matplotlib, Seaborn

Tools: NumPy, Pandas, Scikit-learn, PyTorch, Transformers (BERT, DistilBERT, RoBERTa), NLTK

ARTICLE | CODE

Predicting Energy Consumption (Part 1)

Language: Python

Description: Conducted an EDA of hourly energy consumption data and compared performance of ARIMA time series forecasting methods.

Skills: Exploratory Data Analysis, Regression, Seasonal Composition, Stationarity, Moving Averages, Autoregression & Autocorrelation, ARIMA Models, Augmented Dickey-Fuller Test

Visualizations: Matplotlib, Seaborn

Tools: NumPy, Pandas, statsmodels, Scikit-learn

ARTICLE | CODE

Predicting Energy Consumption (Part 2)

Language: Python

Description: Expanded on the work completed in Part 1, evaluating advanced time series forecasting methods using hourly energy consumption data.

Skills: Regression, Simple Exponential Smoothing, Triple Exponential Smoothing (Holt-Winters Method), LSTM Neural Networks, Prophet

Visualizations: Matplotlib, Seaborn

Tools: NumPy, Pandas, statsmodels, Scikit-learn, Keras, TensorFlow, Prophet

ARTICLE | CODE

Audio Analysis: How to Impress (or Disappoint) Pitchfork

Language: Python

Description: Using the Spotify Web API, extracted audio features for songs featured on albums listed in the Pitchfork album review dataset. Compared songs from the top and bottom 10% of albums in the dataset, sorted by review score, to identify specific features that may be correlated a high album review score.

Skills: Web Scraping, Exploratory Data Analysis, Mann-Whitney U Test, Common Language Effect Size

Visualizations: Matplotlib, Seaborn

Tools: NumPy, Pandas, Spotipy, SciPy

ARTICLE | CODE

Generating an Edgar Allan Poe-Styled Poem Using GPT-2

Language: Python

Description: Fine-tuned a pre-trained language model (GPT-2) using the complete poetical works of Edgar Allan Poe to generate poetry in the author’s style.

Skills: Pre-Trained Language Models, Web Scraping, Unsupervised Language Models, Text Generation, Transformers, Natural Language Processing

Tools: NumPy, Pandas, BeautifulSoup, Transformers (GPT-2), PyTorch

ARTICLE | CODE

UFO Sighting Explorer

Language: R

Description: Developed an interactive web application that allows the user to explore data for over 60,000 UFO sightings reported in the U.S. from 1949-2013.

Skills: Web Application Development, Data Cleaning

Visualizations: Plotly

Tools: Shiny, dplyr

CODE | APPLICATION