|
광고 슬롯: header-banner
광고 슬롯: content-top

ChatGPT Prompts for Data Analysis: 7 Copy-Ready Templates (2026)

You paste a CSV description into ChatGPT and get back three bullet points that could apply to any dataset. The problem isn't ChatGPT — it's the prompt. Data analysis is a structured process, and ChatGPT needs to know your role, the specific dataset context, the exact task, and the output format before it can give you anything worth using.

These seven templates follow a four-element structure — (Role) (Context) (Task) (Format) — designed for the most common data analysis workflows: cleaning, exploratory analysis, chart selection, statistical interpretation, and insight summaries. Every template is built to copy and fill in the [brackets] with your specifics. No fluff, no "here are some ideas." Just prompts that work.

data analyst working on laptop with charts and spreadsheets in a modern professional workspace, natural light, focused expression
Structured prompts are the difference between a three-bullet response and a usable analytical output.

Why Most Data Analysis Prompts Fail — and What to Include Instead

Generic data prompts fail because ChatGPT has no idea what kind of analyst you are, what the dataset represents, what specific question you need answered, or what format the output should take. The fix is a four-element prompt: (Role) establishes domain expertise, (Context) describes the data and the business situation, (Task) names the specific operation, and (Format) defines how the output should be structured. These four elements convert a vague request into a precise analytical brief.

광고 슬롯: content-mid

Think of it this way: if you hired a freelance analyst and said "analyze my sales data," you'd get a generic report. If you said "you're a retail analyst, here's our Q2 regional sales table with columns for region, SKU, revenue, and return rate — find the three biggest contributors to return rate and format your findings as a ranked list with a one-sentence explanation each," you'd get something you could act on. The same principle applies to ChatGPT.

According to McKinsey's 2023 report on the economic potential of generative AI, productivity gains are most concentrated in tasks with high interpretation and communication load — exactly where structured ChatGPT prompts pay off most. (Source: McKinsey Global Institute, "The Economic Potential of Generative AI," June 2023)

Before vs. After: The Four-Element Structure in Action

Before — Weak Prompt (Typical Result)
You

Can you analyze this dataset and tell me what's wrong with it?

Columns: customer_id, purchase_date, amount, product_category, region

I have about 15,000 rows from last quarter.

ChatGPT

Sure! Here are some common data quality issues to look for in your dataset:

- Missing values in any column
- Duplicate customer IDs
- Amounts that seem unusually high or low
- Dates that fall outside the expected range
- Inconsistent formatting in the product_category or region fields

Would you like me to help you write some code to check for these?

Result: Generic checklist. No prioritization, no column-specific logic, no actionable output. Could apply to any dataset on earth.
Now with the four-element structure
After — Four-Element Prompt (Usable Output)
You

(Role) You are a data quality analyst specializing in e-commerce transaction data.
(Context) I have 15,000 rows of Q2 sales data with columns: customer_id (int), purchase_date (YYYY-MM-DD), amount (float, USD), product_category (string: Electronics / Apparel / Home / Beauty), region (string: North / South / East / West). This will be used to build a Q2 performance dashboard for the executive team.
(Task) Audit the dataset for the three highest-risk data quality issues that could distort a revenue dashboard. For each issue: name it, explain why it matters for dashboards specifically, and give a pandas one-liner to detect it.
(Format) Numbered list, 3 items. Each item: Issue name (bold) → Business impact (1 sentence) → Detection code (code block).

ChatGPT

1. Negative or zero amounts
Business impact: Returns and refunds recorded as negative amounts will understate net revenue when summed — a common source of dashboard discrepancies.
df[df['amount'] <= 0]

2. Dates outside Q2 range (April 1 – June 30)
Business impact: Rows from prior or future quarters inflate Q2 metrics without appearing in standard date-range filters if the dashboard uses >= filtering logic.
df[(df['purchase_date'] < '2026-04-01') | (df['purchase_date'] > '2026-06-30')]

3. Invalid product_category values
Business impact: Any row with a misspelled or undocumented category (e.g., "Electroncs", "N/A") will be dropped from category-level breakdowns, creating an invisible "other" bucket.
df[~df['product_category'].isin(['Electronics','Apparel','Home','Beauty'])]

Result: Specific to your data, prioritized by business impact, immediately actionable. Three things you can run today.
(Role) e-commerce analyst + (Context) 15k rows, column types, use case + (Task) top-3 risks with dashboard impact + (Format) numbered, bold issue, impact sentence, code block

The structure also tells ChatGPT what level of output to produce. Saying "you are a data quality analyst" prevents it from giving a beginner's tutorial. Specifying "pandas one-liner" prevents it from writing a 30-line function when you need something quick. For more on this framework, see prompt engineering explained and how to get specific answers from AI.

Prompts for Data Cleaning and Quality Checks

For data cleaning tasks, the most effective ChatGPT prompts specify the dataset schema (column names and types), the downstream use case (why data quality matters here), and the output format (code, a ranked issue list, a summary table). ChatGPT cannot see your actual data unless you paste it in, so the (Context) element — describing the structure and business meaning of the data — does most of the heavy lifting. A well-formed cleaning prompt produces actionable detection code and prioritized fixes, not a generic checklist.

These two templates cover the most common cleaning scenarios: a full audit pass and targeted outlier/anomaly detection. Both work whether you paste a sample of your data directly or describe the schema.

Template 1 — Data Cleaning Audit

(Role) You are a data engineer reviewing a dataset before it enters a production pipeline. (Context) Dataset: [describe table name or file]. Columns: [list column names, types, and any known constraints — e.g., customer_id (int, should be unique), signup_date (YYYY-MM-DD), plan_tier (string: Free / Pro / Enterprise), monthly_revenue (float, USD, should be non-negative)]. This data will be used for [describe purpose: e.g., a churn prediction model / a monthly revenue report / a customer segmentation analysis]. (Task) Identify the five most critical data quality issues that could cause errors or misleading results in the downstream use case. For each issue: state the problem, explain the business or modeling risk, and provide a pandas code snippet to detect affected rows. (Format) Numbered list, 5 items. Structure each item: Issue (bold heading) → Risk (1 sentence) → Detection code (python code block). Sort by severity — highest risk first.
Elements: (Role) data engineer + (Context) schema + use case + (Task) top-5 issues by severity + (Format) ranked numbered list with code

Template 2 — Outlier and Anomaly Detection

(Role) You are a quantitative analyst with expertise in statistical data validation. (Context) I have a [describe dataset: e.g., sales transactions / sensor readings / survey responses] dataset with [N] rows. The key numeric columns are: [column name 1] (expected range: [min–max]), [column name 2] (expected range: [min–max]). The data covers [time period or population]. Outliers could indicate [data entry errors / genuine anomalies / fraud / equipment failure — choose what applies]. (Task) Recommend the most appropriate outlier detection method for this data type and write Python code implementing it. For each numeric column, flag rows that are likely errors versus rows that are likely genuine anomalies worth investigating. (Format) First, a 2-sentence rationale for the detection method chosen. Then Python code that adds a flag column to the dataframe for each suspicious row. Finally, a 3-row example of what the flagged output looks like (you can use placeholder values).
Elements: (Role) quant analyst + (Context) data type, columns, expected ranges + (Task) method choice + code + (Format) rationale + code + example output
Tip: You do not need to paste your full dataset. Paste the first 5–10 rows as a CSV sample alongside the schema description. ChatGPT will write correct, column-specific code from a small sample — and avoids the risk of exposing sensitive data unnecessarily.

Prompts for Exploratory Data Analysis and Visualization

For EDA and visualization, ChatGPT is most useful as a workflow planner and code generator — not as a data viewer. Tell it your data's shape and research question, and it will suggest what to look at first, which visualizations reveal which patterns, and write the corresponding Python, R, or Excel chart code. The critical detail in these prompts is specifying your analysis goal: "understand customer churn" and "understand seasonal revenue patterns" call for completely different EDA paths and chart types.

These two templates cover structured EDA planning (what to look at and in what order) and chart/visualization code generation (given a data pattern, write the code for the right chart).

Template 3 — EDA Structured Analysis Plan

(Role) You are a senior data analyst experienced in exploratory data analysis for [business domain: e.g., e-commerce / healthcare / SaaS / financial services]. (Context) I have a dataset with [N] rows and the following columns: [list all columns with types and what they represent]. My main research question is: [state the specific question — e.g., "Which customer segments have the highest 90-day churn rate?" / "What factors predict a high average order value?"]. I will be working in Python with pandas and matplotlib/seaborn. (Task) Design a step-by-step EDA plan. For each step: (1) what to examine, (2) why it matters for my research question, and (3) the exact pandas/matplotlib code to run it. (Format) Numbered steps (aim for 6–8). Each step: Step title (bold) → Why it matters (1 sentence) → Code block. End with a short paragraph noting what follow-up analysis the EDA results will likely suggest.
Elements: (Role) senior analyst in domain + (Context) schema + research question + tool stack + (Task) 6–8 EDA steps with rationale + code + (Format) numbered + code blocks + follow-up note

Template 4 — Chart Type Selector and Code Generator

(Role) You are a data visualization specialist with expertise in communicating analytical findings to non-technical stakeholders. (Context) I need to visualize the following relationship: [describe what you want to show — e.g., "how monthly revenue changed over 12 months, broken down by product category" / "the distribution of customer ages across three plan tiers" / "the correlation between ad spend and conversion rate across 50 campaigns"]. My audience is [e.g., executives who are not data-literate / a technical product team / investors]. I am using [Python matplotlib/seaborn / R ggplot2 / Excel / Tableau — choose one]. (Task) Recommend the single best chart type for this data and audience combination. Explain in one sentence why it is better than the two most obvious alternatives. Then write complete, runnable code that produces a publication-quality version of the chart using placeholder data structured like mine. (Format) Lead with: "Best chart type: [name] — [one-sentence reason over alternatives]." Then the full code block. Then a 2-sentence note on any design choices in the code (color, labeling, axes) that were made specifically for the non-technical audience constraint.
Elements: (Role) viz specialist + (Context) what to show + audience + tool + (Task) best chart + rationale over alternatives + code + (Format) verdict sentence + code + design note

Chart Type Quick-Reference by Data Pattern

Data Pattern / Goal Recommended Chart Avoid Prompt Keyword to Use
Change over time (single series) Line chart Bar chart (implies discrete) "trend over time, single metric"
Change over time (multiple categories) Small multiples line chart or stacked area Pie chart (impossible to compare change) "multi-category trend, 12-month"
Part-to-whole (5 categories or fewer) Horizontal bar chart (sorted) Pie chart (hard to read beyond 3 slices) "share breakdown, ranked proportions"
Distribution of a numeric variable Histogram or violin plot Line chart (misrepresents frequency) "distribution, single continuous variable"
Relationship between two numeric variables Scatter plot (with trend line if appropriate) Line chart (implies time ordering) "correlation, two continuous variables"
Comparing groups (3+ categories) Grouped bar chart or box plot Stacked bar (hard to compare inner segments) "group comparison, multiple categories"
Geographic variation Choropleth map or dot map Bar chart (loses geographic context) "regional variation, geographic distribution"

For more on structuring iterative analysis workflows with AI, see how to use AI for research and the guide on ChatGPT prompts for coding — the pandas code generation patterns there transfer directly to data analysis tasks.

Prompts for Statistical Interpretation and Insight Summaries

Statistical output — regression tables, p-values, confidence intervals, ANOVA results — is where ChatGPT adds the most value for analysts who understand the numbers but need to communicate findings to non-technical stakeholders. An effective interpretation prompt pastes the actual model output (or a realistic representation of it) directly into the prompt, specifies the audience, and asks for a precise communication format. The result is a plain-language narrative that is accurate to the numbers, not a generic explanation of what p-values mean.

A 2022 GitHub study validated by Microsoft's New Future of Work Report found that developers using AI code assistants completed tasks 55% faster on average. For analysts translating statistical output into stakeholder narratives — a time-intensive communication task — a well-structured ChatGPT interpretation prompt produces comparable leverage.

Template 5 — Statistical Results Interpreter

(Role) You are a statistician who specializes in translating quantitative findings for non-technical business audiences. (Context) I ran a [type of analysis: e.g., multiple linear regression / logistic regression / one-way ANOVA / Pearson correlation] to answer this question: [state the research question]. The output is below: [Paste your actual model output here — e.g., regression coefficients table, ANOVA table, or correlation matrix] The audience for this interpretation is [e.g., senior marketing managers with no statistics background / a product team that knows basic stats / a CFO]. (Task) Interpret the key findings in plain language. Focus on: (1) the main answer to the research question, (2) the 2–3 most important individual findings, (3) any important caveats or limitations the audience should know before acting on these results. (Format) Start with a one-sentence headline finding. Then three numbered points. End with a single "What this means for [business decision]" sentence. No statistical jargon in the final output — use plain language equivalents (e.g., "statistically meaningful" instead of "significant at p < 0.05").
Elements: (Role) statistician for non-technical audiences + (Context) analysis type + pasted output + audience + (Task) main finding + top 3 + caveats + (Format) headline + 3 points + so-what sentence, no jargon

Template 6 — Insight Narrative Generator

(Role) You are a business intelligence analyst writing the narrative section of a quarterly data report. (Context) I have completed the following analysis: [describe what you did — e.g., "analyzed 6 months of customer support ticket data, segmented by issue type, product line, and resolution time"]. The key findings are: - [Finding 1: e.g., "Average resolution time is 47% longer for Product Line B than Product Line A"] - [Finding 2: e.g., "Billing issues account for 38% of all tickets but only 12% of escalations"] - [Finding 3: e.g., "Tickets submitted on Mondays take 22% longer to resolve than mid-week tickets"] The audience is [describe: e.g., the VP of Customer Experience, who will present this to the board next week]. (Task) Write the narrative section of the report that presents these findings as a coherent story. The narrative should: (1) open with the most surprising or actionable finding, (2) explain what the data shows and why it matters, (3) end with a clear implication for the team's next decision. (Format) 3–4 paragraphs, professional prose tone. No bullet points in the narrative itself — this will be read aloud. Aim for 200–250 words. Do not introduce findings not listed above.
Elements: (Role) BI analyst writing a report + (Context) analysis done + findings listed + audience + (Task) coherent narrative with story arc + (Format) 3–4 paragraphs, prose, 200–250 words, no new findings

Interpretation Prompt in Action: Regression Output to Plain Language

Template 5 — Statistical Interpreter (Example Exchange)
You

(Role) You are a statistician who specializes in translating quantitative findings for non-technical business audiences.
(Context) I ran a multiple linear regression to answer: "Which factors most strongly predict a customer's annual contract value?" Output below:

Dependent variable: Annual Contract Value (USD)
R-squared: 0.61 | Adjusted R-squared: 0.58 | n = 412

Variable             Coef     Std Err    p-value
Intercept           4,200     610        <0.001
Seats (count)       38.4      4.2         <0.001
Industry_SaaS      6,100     980        <0.001
Annual_visits      1.2       0.9         0.18
Acq_channel_Paid 1,400     720        0.053

Audience: VP of Sales, no statistics background.
(Task) Interpret key findings in plain language. (1) Main answer to research question. (2) 2–3 most important individual findings. (3) Caveats.
(Format) One-sentence headline, three numbered points, one "What this means for sales prioritization" sentence. No jargon.

ChatGPT

Headline: Seat count and industry type are the two factors that most strongly predict how large a customer's annual contract will be — together they explain about 60% of the variation in contract value across your customer base.

1. Every additional seat adds about $38 to annual contract value. This relationship is extremely reliable — the data is very consistent on this point. For a 100-seat deal versus a 50-seat deal, you'd expect roughly $1,920 more in annual revenue, all else equal.

2. SaaS-industry customers are worth about $6,100 more per year than customers in other industries, on average. This is a large and reliable difference — it should factor into how you prioritize sales outreach by vertical.

3. Website visit frequency and paid acquisition channel showed weaker, less reliable effects — they may matter, but the data doesn't give us enough confidence to act on them yet.

Caveat: This model explains about 60% of contract value variation, which means 40% is driven by factors not in this analysis — possibly negotiation history, deal tenure, or product configuration.

What this means for sales prioritization: Focus outreach on SaaS-industry prospects with high projected seat counts — these two factors alone are the strongest predictors of large contracts your team has available today.

(Role) statistician + (Context) regression output pasted + audience = non-technical VP + (Task) main finding + top findings + caveats + (Format) headline + numbered + so-what

Advanced Workflows — CSV/Excel, Multi-Step Analysis, and Iteration

For CSV/Excel formula work and multi-step analysis pipelines, the most effective prompts treat ChatGPT as a workflow co-pilot rather than a one-shot answer machine. Describe the transformation you need, the tool you are using, and the exact input and output structure you expect. For iterative analysis — where the output of step 1 informs step 2 — brief ChatGPT on the full analysis arc at the start so it can maintain consistent variable names, assumptions, and formatting conventions across follow-up messages.

The Stanford HAI AI Index 2024 found that AI adoption in analytical roles is outpacing adoption in creative roles by roughly 2:1 in organizational surveys — suggesting that data and analysis workflows are where AI tools are generating the most verified productivity gains right now.

Template 7 — Full Analysis Workflow Orchestrator

(Role) You are a data analyst who builds end-to-end analytical pipelines in [Python with pandas / SQL / Excel — choose one]. (Context) I am analyzing [describe dataset and source: e.g., "a CSV of 8,000 customer transactions from our CRM, exported for Q2 2026"]. Columns: [list all columns, types, and what they represent]. My final deliverable is: [describe what you need at the end — e.g., "a one-page summary table showing monthly revenue, new customers, and churn rate by region, ready to copy into a slide deck"]. (Task) Design and write the complete analysis workflow from raw data to final output. The workflow should: (1) load and validate the data, (2) clean the key issues (missing dates, duplicate IDs, negative amounts), (3) compute the required metrics, (4) produce the final output structure. Write this as a single, complete, runnable script with comments explaining each step. (Format) Full Python script (or SQL query block, or Excel step-by-step instructions — match my tool choice). Use clear variable names that reflect what the data represents. Add a 3-line comment block at the top summarizing what the script does. At the end of the script, print a 5-row preview of the final output.
Elements: (Role) end-to-end pipeline analyst + (Context) dataset + columns + final deliverable + (Task) complete workflow load→clean→compute→output + (Format) runnable script with comments + preview print

How to Iterate After Your First Response

The four-element structure works for follow-up prompts too. After getting an initial analysis, use these patterns to refine:

Iteration Goal Follow-Up Prompt Pattern What It Does
Drill down on one finding "Looking only at [segment/group from previous output], repeat the analysis broken down by [new dimension]." Maintains all prior context, narrows focus
Change output format "Reformat the previous output as [a table / a bullet list / a 3-sentence executive summary]. Keep the same findings." Repurposes analysis without re-running it
Add a comparison baseline "Add a comparison column showing Q1 values alongside the Q2 values in the previous table. Highlight the top-3 changes." Contextualizes findings against prior period
Stress-test an assumption "What if [assumption from previous analysis] is wrong? How would the conclusion change?" Tests robustness without redoing full analysis
Generate follow-up questions "Based on the findings above, what are the three most important questions this analysis leaves unanswered?" Surfaces gaps and next steps automatically

For more on iterative prompting techniques, see few-shot prompting explained — the same "show me an example, then apply it" pattern works well for multi-step data analysis workflows. If you use ChatGPT for work tasks beyond data analysis, ChatGPT prompts for work covers the broader professional context.

close-up of data charts and handwritten analytical notes on a desk, professional analysis workspace with natural light
The four-element prompt structure applies at every stage of a data analysis project — from cleaning to communication.

Frequently Asked Questions

Can ChatGPT actually analyze data — or just talk about it?

ChatGPT can analyze data in two ways: (1) if you paste a small dataset (a few hundred rows or fewer) directly into the chat, it can describe patterns, compute summary statistics, and identify anomalies in that pasted data; (2) it can write analysis code (Python, SQL, R, Excel formulas) that you run yourself on larger datasets. It does not have direct access to your files or databases unless you use the Code Interpreter feature in ChatGPT Plus, which allows it to execute Python against uploaded files. For most professional use cases, the most reliable workflow is using ChatGPT to write and explain the code, then running it in your own environment.

How do I share my data with ChatGPT safely?

The safest approaches, in order: (1) describe the schema and paste only a 5–10 row anonymized sample — ChatGPT will write correct column-specific code from a small example; (2) use ChatGPT's Code Interpreter with a de-identified file (remove names, emails, IDs before uploading); (3) for sensitive enterprise data, use the ChatGPT Enterprise tier (OpenAI commits not to use Enterprise conversations for training) or a locally hosted model. Never paste full production datasets with real customer PII into the free or Plus tiers.

What is the best way to describe a dataset to ChatGPT without uploading it?

Give ChatGPT: the column names, their data types (int, string, float, date), their meaning in plain language, the approximate row count, and one representative example row if possible. That is enough context for it to write accurate, column-specific analysis code. Adding the downstream use case ("this will be used for a churn model") makes the output even more targeted. You do not need to paste all your data — schema plus sample plus goal is sufficient for most tasks.

Can ChatGPT write Python pandas code for data analysis?

Yes — Python pandas is one of ChatGPT's strongest outputs, and the code quality improves significantly when you specify the schema and task precisely using the four-element structure. It generates correct code for groupby operations, pivot tables, merges, rolling averages, and most common pandas transformations. For complex pipelines, generate the code in sections and test each section before asking for the next. If the code produces an error, paste the error message back to ChatGPT — it will fix it accurately about 85–90% of the time in straightforward cases.

How do I get ChatGPT to interpret statistical output — p-values, R-squared, confidence intervals?

Paste the actual statistical output directly into the prompt (a regression table, ANOVA results, or correlation matrix). Then specify your audience: "explain this to a VP with no statistics background" produces plain-language narrative; "explain this to a data scientist reviewing my methodology" produces technical critique. The key is the (Context) element — pasting real numbers instead of describing them in general terms produces much more accurate and specific interpretation. Template 5 above gives a complete structure for this.

Is ChatGPT better than dedicated data analysis tools like Excel or Python?

ChatGPT is a complement to these tools, not a replacement. Excel and Python execute analysis — ChatGPT helps you design the analysis, write the code, interpret the results, and communicate findings. The highest-leverage use is to have ChatGPT write the Python or Excel code you then run in your own environment, where you control the data and can verify the output. ChatGPT does not replace the need to understand your data or to validate results — it accelerates the coding and communication steps that take the most time in a typical analysis workflow.

광고 슬롯: content-bottom
광고 슬롯: comments-top

Comments

Comments (0)

Leave a Comment

← Back to List
광고 슬롯: mobile-anchor