Technical Documentation for Data Science & AI Projects
View the Project on GitHub AlassaneDialDiop/browns_data_science
graph TB
%% Title styling
subgraph Build["<b>Build Catalog (Pre-computed)</b>"]
A[📷 Product Photos] --> B[🖼️ Pre-processing]
B --> C["🧠 AI_EMBED<br/><i>voyage-multimodal-3<br/>1024-dim vectors</i>"]
C --> D[💾 Vector Storage]
end
subgraph V5["<b>V5: Snowflake Native</b>"]
E[📱 st.camera_input] --> F[⚡ Stage Upload]
F --> G["🧠 AI_EMBED<br/><i>voyage-multimodal-3</i>"]
G --> H[🔍 Vector Search]
end
D --> DB[("❄️ Snowflake<br/>Vector Database<br/><i>X products</i>")]
DB --> H
H --> R["👟 <b>Top 5 Products</b><br/><i>Multi-face scoring</i>"]
%% Styling
classDef buildStyle fill:#fff,stroke:#000,stroke-width:2px
classDef v5Style fill:#e3f2fd,stroke:#2196f3,stroke-width:3px
classDef embedStyle fill:#e8eaf6,stroke:#9fa8da,stroke-width:2px
classDef dbStyle fill:#fff3e0,stroke:#ffb74d,stroke-width:3px
classDef matchStyle fill:#f3e5f5,stroke:#ba68c8,stroke-width:2px
classDef resultStyle fill:#e8f5e9,stroke:#66bb6a,stroke-width:3px
classDef featureStyle fill:#f1f8e9,stroke:#8bc34a,stroke-width:2px
class A,B,D buildStyle
class E,F,G,H v5Style
class C,G embedStyle
class DB dbStyle
class R resultStyle
class I,J,K featureStyle
Snowflake Streamlit + st.camera_input – Native camera access directly in Snowflake environment with instant photo capture. Auto-processing triggers immediately when photo is taken, eliminating need for external streaming or plugins. Clean black/white Montserrat UI for professional retail environment.
Snowflake AI_EMBED (voyage-multimodal-3) – Converts shoe images into 1024-dimensional vectors using Snowflake's native AI functions. voyage-multimodal-3 is optimized for visual understanding with superior accuracy on fashion/retail imagery. Processes images directly within Snowflake's secure environment.
Snowflake Native Session – All processing happens within Snowflake using get_active_session()
. Images uploaded via session.file.put()
to internal stages, eliminating external dependencies. Processing time ~2 seconds including upload, embedding, and search.
Snowflake Vector Database – Pre-computed embeddings for X products stored natively. Advanced multi-face scoring algorithm averages similarity across multiple product angles (faces 1-4) for improved accuracy. Uses VECTOR_COSINE_SIMILARITY
for ultra-fast matching.
Single-File Streamlit App – Complete application in one app.py
file with embedded database class. No external infrastructure, containers, or services required. Deploy directly to Snowflake Streamlit with simple upload. Built-in error handling and file cleanup.
Last updated: 2025-08-01 V5.0 - Snowflake Native