Simple Web Scraping Framework, based on Curl and str* functions. Requires PHP ^8.0 (can easily be downgraded to PHP 7)
There is some functions to simplify your script, they are listed below:
- upname => Format and return string to name formats (first each word character is uppercase)
- price => Return float value of and price string of type 'xx$: 9.999,99'
- accents => Replace accentuation with equivalent characters
- strmstr => Return string after start3, after start2, after start1
- strpart => Return middle string between start and end strings
- strmpart => Return middle string between start2 and end string, which start2 is after start1
- Scrapping::cache => can save or return an cache
- Scrapping::cacheFolder => you can set a custom folder to cache
- Scrapping::json => parse and print the response in json
- Scrapping::isOnSession => tell if an session is set with server
- Scrapping::load => Reuse session of previews connections
- Scrapping::useSession => Set/Return if session setting is enabled
- Scrapping::userAgent => Set/Return userAgent
- Scrapping::server => Set/Return server host base URL
- Scrapping::session => Set/Return server session id
- Scrapping::hasSession => Returns if has a server session id set
- Scrapping::sesionName => Set/Retrun server session cookie name
- Scrapping::get => Make and return an get request to the server
- Scrapping::post => Make and return an post request to the server
- Scrapping::proccess => Proccess the get and post request to organaze data
upname(string $text);
Just pass the text
string to format as parameter and the result will be the formated string. Some exemples bellow, the comment of each block represents the output:
echo upname('lara vieira');
// 'Lara Vieira'
echo upname('LARA VIEIRA');
// 'Lara Vieira'
echo upname('LEONARDO DE CÁPRIO');
// 'Leonardo de Cáprio'
echo upname('DON PEDRO II');
// 'Don Pedro II'
price(string $text);
Just pass the text
string to format as parameter and the result will be the float value. Some exemples bellow, the comment of each block represents the output:
echo price('US$: 3.567,56');
// 3567.56
echo price('R$: 3.456.234,45');
// 3456234.45
echo price('Price is R$: 234,45');
// 234.45
accents(string $text);
Just pass the text
to format as parameter and the result will be the formated string. An exemple bellow, the comment represent the output:
echo accents('Aglomeração, Apóstolo, vô, vó');
// 'Aglomeracao, Apostolo, vo, vo'
strmstr(
string $haystack,
string $start1,
string $start2,
string|null $start3=null
);
This function return all haystack
string after start3
string that is after start2
string that is after start1
string (if start3
is passed) or all haystack
string after start2
string that is after start1
string. The return will include the last start passed, like strstr.
This function is something like an stack of strstr functions:
strstr(strstr(strstr(haystack, start1), start2), start3)
Some exemples bellow, the comment of each block represents the output:
echo strmstr('ABC ABC ABC', 'C');
// 'C ABC ABC'
echo strmstr('ABC ABC ABC', 'B', 'A');
// 'ABC ABC'
echo strmstr('ABC ABC ABC', 'B', 'B', 'A');
// 'ABC'
* This is my favorite one for web-scrapping.
strpart(
string $haystack,
string|null $start = null,
string|null $end = null,
bool $keep_start = false
);
This function will return the middle string in haystack
between the first occurence of start
string and the first occurence of end
string after start
string.
-
If
start
string is null, will return everything inhaystack
before the first occurence ofend
string. -
If
end
string is null, will return everything inhaystack
after the first occurence ofstart
string. -
If
keep_start
boolean is set totrue
, default isfalse
, the function will return as normal, but includingstart
string in the retrun's begin.
Some exemples bellow, the comment of each block represents the output:
echo strpart('ABC ABC ABC', ' ', ' ');
// 'ABC'
echo strpart('ABC ABC ABC', ' ');
// 'ABC ABC'
echo strpart('ABC ABC ABC', end:' ');
// 'ABC'
echo strpart('ABC ABC ABC', ' ', ' ', true);
// ' ABC'
echo strpart('<h2>Subtitle<h2>', '>', '<');
// 'Subtitle'
echo strpart('<div><div>Content</div></div>', '<div>', '</div>');
// '<div>Content'
strmpart(
string $haystack,
string $start1,
string $start2,
string|null $end = null,
bool $keep_start = false
);
This function solve the last example of strpart.
This function will return the middle string in haystack
between the first occurence of start2
string, that one is after the first occurence of start1
string, and the first occurence of end
string after start2
string.
-
If
end
string is null, will return everything inhaystack
after the first occurence ofstart2
string, after the first occurence ofstart1
string. -
If
keep_start
boolean is set totrue
, default isfalse
, the function will return as normal, but includingstart2
string in the retrun's begin.
Some exemples bellow, the comment of each block represents the output:
echo strmpart('<div><div>Content</div></div>', '<div>', '<div>', '</div>');
// 'Content'
echo strmpart('<div><div>Content</div></div>', '>', '>', '<');
// 'Content'
echo strmpart('<a id="link1"><h2 id="text1">Content</h2></a>', '<h2', 'id="', '"');
// 'text1'