Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

Features

Clean API
Fast (>1k request/sec on a single core)
Manages request delays and maximum concurrency per domain
Automatic cookie and session handling
Sync/async/parallel scraping
Caching
Automatic encoding of non-unicode responses
Robots.txt support
Distributed scraping
Configuration via environment variables
Extensions

Example

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

See examples folder for more detailed examples.

Installation

Add colly to your go.mod file:

module github.com/x/y

go 1.14

require (
        github.com/fresh8/colly latest
)

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive.
altsab/gowap Wappalyzer implementation in Go.
jesuiscamille/goquotes A quotes scrapper, making your day a little better!
jivesearch/jivesearch A search engine that doesn't track you.
Leagify/colly-draft-prospects A scraper for future NFL Draft prospects.
lucasepe/go-ps4 Search playstation store for your favorite PS4 games using the command line.
yringler/inside-chassidus-scraper Scrapes Rabbi Paltiel's web site for lesson metadata.
gamedb/gamedb A database of Steam games.
lawzava/scrape CLI for email scraping from any website.
eureka101v/WeiboSpiderGo A sina weibo(chinese twitter) scrapper
Go-phie/gophie Search, Download and Stream movies from your terminal
imthaghost/goclone Clone websites to your computer within seconds.
superiss/spidy Crawl the web and collect expired domains.
docker-slim/docker-slim Optimize your Docker containers to make them smaller and better.
seversky/gachifinder an agent for asynchronous scraping, parsing and writing to some storages(elasticsearch for now)
eval-exec/goodreads crawl all tags and all pages of quotes from goodreads.

If you are using Colly in a project please send a pull request to add it to the list.

Name		Name	Last commit message	Last commit date
Latest commit History 655 Commits
.github		.github
_examples		_examples
cmd/colly		cmd/colly
debug		debug
extensions		extensions
proxy		proxy
queue		queue
storage		storage
vendor		vendor
.codecov.yml		.codecov.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
VERSION		VERSION
colly.go		colly.go
colly_test.go		colly_test.go
context.go		context.go
context_test.go		context_test.go
go.mod		go.mod
go.sum		go.sum
htmlelement.go		htmlelement.go
http_backend.go		http_backend.go
http_trace.go		http_trace.go
http_trace_test.go		http_trace_test.go
request.go		request.go
response.go		response.go
unmarshal.go		unmarshal.go
unmarshal_test.go		unmarshal_test.go
xmlelement.go		xmlelement.go
xmlelement_test.go		xmlelement_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Colly

Features

Example

Installation

Bugs

Other Projects Using Colly

Contributors

Backers

Sponsors

License

About

Releases

Packages

Languages

License

fresh8/colly

Folders and files

Latest commit

History

Repository files navigation

Colly

Features

Example

Installation

Bugs

Other Projects Using Colly

Contributors

Backers

Sponsors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages