Golang_crawler

Crawler is a distributed web crawler written in golang without using any crawler framework.

It is a personal project 😄, starting from zero using the native code to build a distributed crawler system.

The main purpose is to deeply understand the concurrency mechanism of golang and the design idea of the distributed system.

Introduction

The breadth-first algorithm framework,embedded data crawling and the information extraction is applied to implement the basic crawler task.
Utilize the natural advantages of Go in concurrency to achieve the distribution and scheduling of crawler tasks to achieve concurrent requirements.
Using rpc to separate and be independent of concurrent tasks in a single task version to implement distributed crawlers.
Using Docker+ElasticSearch to build a data storage backend, using the Go template library for data display

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
crawler_distributed		crawler_distributed
engine		engine
fetcher		fetcher
model		model
persist		persist
scheduler		scheduler
zhenai/parser		zhenai/parser
.DS_Store		.DS_Store
README.md		README.md
go.mod		go.mod
main.go		main.go