ChatGPT Capabilities for Data Scientists: Common Duties Automated

Guide on ChatGPT's approach to data cleaning, investigation, representation, model development, and other related tasks.

, and Administrator

2025 August 27 . 1:59 AM

2 min read

Data Scientist Jobs Made Easier: 5 Regular Tasks Automated by ChatGPT

ChatGPT Capabilities for Data Scientists: Common Duties Automated

In the ever-evolving world of data science, the quest for efficiency is paramount. A new tool, ChatGPT, is making waves for its ability to handle routine data science tasks, streamlining the workflow and allowing data scientists to focus more on interpretation and decision-making.

The goal of a recent project was to understand why some customers did not successfully get a car by examining key matching metrics. To achieve this, a Streamlit app was built using the Gemini CLI. This app, demonstrated using a data project from Gett, a London-based taxi app, displays each step in a different tab.

ChatGPT, when prompted, can manage the entire workflow, from cleaning and organizing the data, to performing exploratory data analysis and visualization, preparing the dataset for machine learning, applying models, and even creating a Streamlit dashboard for quick interaction with the data pipeline.

The five core tasks that ChatGPT can handle in a data project are:

Data Cleaning and Preprocessing: Handling missing data, detecting outliers, encoding categorical variables, normalizing or standardizing data.
Exploratory Data Analysis (EDA): Summarizing datasets with descriptive statistics, identifying patterns or trends, and generating textual explanations of key metrics.
Data Visualization: Assisting in creating charts and graphs by providing code examples to visualize data insights.
Model Building and Evaluation: Helping to write code for training machine learning models, tuning parameters, and evaluating performance using appropriate metrics.
Documentation and Reporting: Producing clear, concise summaries of findings, interpreting model results, and generating sections of project reports to communicate insights effectively.

These tasks, when automated, significantly reduce the time spent on routine coding and analysis, freeing up data scientists to focus on the interpretation and decision-making aspects of their work.

The data project in question, analyzing failed rider orders from Gett, encountered challenges such as missing values in both datasets. ChatGPT was able to convert date columns, drop invalid orders, and impute missing values in the m_order_eta, ensuring the data was clean and ready for analysis.

The prepared dataset for machine learning involved encoding categorical variables, scaling numerical features, and returning a clean DataFrame ready for modeling. The model's steps in the machine learning process were then explained, and the model's performance was reported with machine learning evaluation metrics like accuracy, precision, recall, and F1-score.

It's worth noting that only relevant 5 features were used in the model, further emphasizing the efficiency of ChatGPT in handling data science tasks. The project also used a basic machine learning model to predict a target variable.

Nate Rosidi, a data scientist and adjunct professor, founder of StrataScratch and a contributor to this article, highlights the practical ability of ChatGPT to manage these five key tasks via suitable prompts.

Moreover, the use of Gemini CLI for handling routine data science tasks, including building a Streamlit app that automates EDA, data cleaning, visualization, and modeling, was instrumental in the success of this project.

ChatGPT can also generate visualizations by following a provided link, a process called Retrieval-Augmented Generation, further enhancing its utility in data science projects.

The data science report by Anaconda states that data scientists spend nearly 60% of their time on cleaning and organizing data. Tools like ChatGPT and Gemini CLI are poised to revolutionize this landscape, making data science more efficient and accessible.

[1] [Data Science Report by Anaconda] [2] [Article by Nate Rosidi on StrataScratch] [3] [ChatGPT Demonstration using Gett Dataset]

Latest

In the image there are bras,panties and skirts with a text above it, this is a graphic image.

Fashion-and-beauty

Calvin Klein Underwear Now Affordable During Amazon Prime Day 2025

Prime Day deals make Calvin Klein underwear more affordable. Enjoy top-notch comfort and durability at a fraction of the usual cost.

, and Administrator

2025 October 9

In this picture there is a shop with white naming board and red color brick wall. In the front...

Harvest Your Wealth

Ethical Design Transforms Hospitality: Sustainable Chef Coats Lead the Way

Sustainable chef coats are just the beginning. Ethical design is transforming the hospitality industry, one conscious decision at a time.

, and Administrator

2025 October 9

In the picture there is a newspaper front page. There are many advertisements and headlines are...

Harvest Your Wealth

2012 Reynolds Awards: Unveiling Powerful Investigative Journalism

Meet the winners of the 2012 Reynolds Awards, whose investigative journalism sparked reform and exposed abuses, despite the mystery surrounding the top prize winner.

, and Administrator

2025 October 9

In this picture it looks like a pamphlet of a company with an image of a cup on it.

Harvest Your Wealth

Boost Retention: 5 Key Steps Businesses Must Take

Competitive pay and career growth opportunities are key. But businesses must also foster a positive culture and use tech to engage employees.

, and Administrator

2025 October 9

ChatGPT Capabilities for Data Scientists: Common Duties Automated

ChatGPT Capabilities for Data Scientists: Common Duties Automated

Read also:

Related

Latest