Search Engine

Scalable Google-like search engine with service-oriented architecture, distributed MapReduce pipeline, inverted indexing, tf-idf and PageRank algorithms deployed on AWS.

Tech Stack

PythonMapReduceFlaskAWSRESTful APIs

Overview

Built a scalable, Google-like search engine using service-oriented architecture, distributed MapReduce pipeline, and inverted indexing. Designed and implemented manager-worker MapReduce framework to process crawled web pages at scale. Applied information retrieval algorithms including tf-idf and PageRank to rank results based on content relevance and link structure. Deployed full search engine stack on AWS.

Architecture

Service-oriented architecture with distributed MapReduce framework for indexing, inverted index data structure, tf-idf and PageRank ranking algorithms, RESTful Index server exposing JSON results, and modular microservice-like components for indexing, ranking, and querying.