Data Analysis Project Guidelines Example
This example guidelines file shows how to structure project-specific guidelines for data analysis projects. Use this as a template when creating your own guidelines file.
Example Guidelines File
--- title: Customer Behavior Analysis Project Guidelines version: 1.0.0 created: 2025-04-10 purpose: Provide guidance for working with the customer behavior analysis project --- # Customer Behavior Analysis Project Guidelines ## Project Purpose and Scope This data analysis project aims to develop a comprehensive understanding of customer behavior patterns across our e-commerce platform. By analyzing transaction data, browsing history, customer service interactions, and marketing engagement, we will identify key factors influencing purchase decisions, customer satisfaction, and lifetime value to inform strategic business decisions. Key objectives: - Identify customer segments based on purchasing behavior and preferences - Analyze the customer journey and highlight friction points in the conversion funnel - Discover correlations between marketing touchpoints and purchase outcomes - Develop predictive models for customer lifetime value and churn probability - Create interactive dashboards for ongoing monitoring of customer metrics - Generate actionable recommendations for improving customer experience and retention Scope includes: - Historical transaction data (Jan 2023 - present) - Website and mobile app usage data - Customer service interaction logs - Email and social media marketing engagement metrics - Product review and ratings data - Customer demographic information (anonymized) Scope excludes: - Individual-level identification (all analysis must be on anonymized data) - Competitor data analysis (focus on our own customers only) - Financial forecasting (separate finance department project) - Technical performance analysis (handled by IT department) ## Data Handling Requirements ### Security and Privacy 1. Data Protection: - All analysis must use anonymized data (see data/anonymization-protocol.md) - PII must be removed before analysis using the sanitization scripts in scripts/sanitize/ - Data must never leave the secure analysis environment - No data should be downloaded to local machines - All exports must be aggregated results only, never raw data 2. Access Controls: - Raw data access limited to data engineering team only - Analysts work with derived datasets in the data/processed/ directory - Results sharing follows the protocol in security/sharing-protocol.md - External sharing requires executive approval via the process in security/approval-workflow.md 3. Compliance Requirements: - All analysis must comply with GDPR and CCPA regulations - Data retention follows the schedule in compliance/retention-policy.md - Documentation of data lineage required for all insights - Right to be forgotten requests handled via scripts/gdpr/forget-user.py ### Data Quality Standards 1. Data Validation: - All datasets must pass validation checks in validation/data-quality-checks.py - Outliers must be documented and addressed according to outlier-handling.md - Missing values handled according to missing-value-protocol.md - Data profiling required before analysis (use scripts/profiling/profile-data.py) 2. Data Transformation: - Use standard transformation pipelines in pipelines/transform/ - Document all transformations in data/transformations/transformation-log.md - Preserve original values when transforming (create new columns) - Use consistent naming conventions per naming-convention.md 3. Access Patterns: - Read from: data/raw/, data/processed/, config/, reference/ - Write to: analysis/, outputs/, models/, visualizations/ - Never modify: data/raw/, logs/, audit/, security/ ## Analysis Methodology ### Data Exploration 1. Exploratory Data Analysis (EDA): - Begin all analysis with structured EDA (use templates/eda-template.ipynb) - Document discoveries in the analysis/eda/ directory - Generate standard statistical summaries (use scripts/stats/summarize.py) - Create exploration visualizations using visualization standards below 2. Feature Engineering: - Document all created features in features/feature-registry.md - Use standard feature engineering pipelines in pipelines/features/ - Test feature importance before extensive use - Reuse established features from features/established-features.md when possible ### Analytical Techniques 1. Segmentation Analysis: - Use K-means clustering as primary segmentation technique - Validate segments with silhouette analysis - Minimum segment size: 5% of customer base - Document segment characteristics in templates/segment-profile.md 2. Funnel Analysis: - Define conversion stages in reference/conversion-stages.md - Calculate standard conversion metrics using scripts/funnels/metrics.py - Perform cohort analysis using templates/cohort-analysis.ipynb - Identify drop-off points and quantify impact 3. Predictive Modeling: - Follow the modeling workflow in workflows/modeling-workflow.md - Use scripts/models/train.py for standard model training - Document model parameters in models/model-registry.md - Validate models using cross-validation and holdout test sets - Log model performance metrics in models/performance-log.md ## Visualization Standards ### Chart Types and Usage 1. Appropriate Chart Selection: - Time series data: Line charts, area charts - Categorical comparisons: Bar charts, dot plots - Distributions: Histograms, box plots, violin plots - Relationships: Scatter plots, bubble charts, heatmaps - Compositions: Stacked bars, pie charts (limited use), treemaps - Flows: Sankey diagrams, network graphs 2. Design Principles: - Minimize chart junk (no 3D, minimal gridlines) - Use appropriate scales (start y-axis at zero for bars) - Ensure sufficient contrast for accessibility - Label directly rather than using legends when possible - Use consistent color scheme from visualization/color-palette.md ### Dashboard Design 1. Layout Guidelines: - Follow templates in visualization/dashboard-templates/ - Organize by business question, not data source - Most important metrics in top left (F-pattern reading) - Related visualizations grouped together - Progressive disclosure for complex information 2. Interactivity Standards: - Consistent filter controls across dashboards - Tool tips for additional context - Date range selectors standardized across projects - Mobile-responsive design required - Loading indicators for operations >1 second 3. Technical Implementation: - Use Tableau for executive dashboards - Use Python (Plotly Dash) for analytical dashboards - Implement caching for large datasets - Optimize query performance for sub-second results - Schedule automatic refreshes based on data update frequency ## Reporting and Communication ### Documentation Requirements 1. Analysis Documentation: - Use Jupyter notebooks with markdown documentation - Follow the template in templates/analysis-template.ipynb - Include purpose, methods, assumptions, and limitations - Document data sources and transformations - Provide interpretation of results and business implications 2. Code Documentation: - Comment all functions with docstrings - Create README files for all directories - Document parameters and return values - Explain complex algorithms with comments and references - Update documentation when code changes ### Presentation Guidelines 1. Executive Summaries: - One-page summaries for key findings - Focus on business implications, not technical details - Include clear recommendations and next steps - Quantify business impact where possible - Use templates/executive-summary-template.pptx 2. Technical Presentations: - Include methodology section for technical audiences - Provide appendix with additional technical details - Balance detail with clarity - Include reproducibility information - Use templates/technical-presentation-template.pptx ## Tools and Technologies ### Standard Toolset 1. Data Processing: - Python (pandas, numpy) for data manipulation - SQL for database queries - dbt for data transformations - Apache Spark for large-scale processing 2. Analysis and Modeling: - Python (scikit-learn, statsmodels) for statistical analysis - Python (TensorFlow, PyTorch) for deep learning when needed - R for specialized statistical techniques - KNIME for visual workflow automation 3. Visualization: - Python (matplotlib, seaborn) for exploratory visualizations - Tableau for business dashboards - Plotly for interactive visualizations - D3.js for custom web visualizations ### Environment Setup 1. Development Environment: - Use the standard Docker container in environments/analysis-container - Virtual environment specifications in requirements.txt - Run setup script setup/configure-environment.sh - Follow IDE configuration in setup/ide-setup.md 2. Version Control: - All code in Git repository - Follow branching strategy in git/branching-strategy.md - Commit messages according to git/commit-message-guidelines.md - Code review required for all analysis scripts ## Collaboration Workflow 1. Project Planning: - Document analysis requests in projects/request-template.md - Prioritize using the framework in projects/prioritization-framework.md - Break down into tasks in task-tracking-system - Set clear deliverables and deadlines 2. Review Process: - Peer review required for all analyses - Use review template in templates/analysis-review-template.md - Review visualizations using checklist in visualization/review-checklist.md - Schedule review meetings for complex analyses 3. Knowledge Sharing: - Document findings in knowledge-base - Weekly team sharing sessions - Create reusable components in modules/shared - Maintain reference documentation in reference/
Guidelines Key Features
This data analysis project guidelines example demonstrates several best practices:
- Clear Analysis Objectives - Defines specific analytical goals and expected outcomes
- Comprehensive Data Governance - Establishes robust security, privacy, and compliance requirements
- Data Quality Standards - Sets expectations for validation, transformation, and documentation
- Structured Methodology - Outlines specific analytical techniques and approaches for different tasks
- Visualization Standards - Provides detailed guidance on chart selection and design principles
- Dashboard Design Guidelines - Establishes layout and interactivity standards for consistent user experience
- Documentation Requirements - Sets expectations for documenting analyses and code
- Toolset Standardization - Specifies which tools to use for different analytical tasks
Adapting This Example
When adapting this template for your own data analysis project, focus on:
- Updating the analysis objectives to reflect your specific business questions
- Adjusting data handling requirements to match your organization's security and privacy policies
- Modifying analytical techniques to align with your project's methodological needs
- Customizing visualization standards to match your organization's design system
- Updating the toolset to include the specific technologies used in your environment
- Adapting documentation requirements to match your team's collaboration style
- Adding any domain-specific analytical frameworks relevant to your industry
Next Steps
Last updated: April 10, 2025