Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Liberation] wp_rewrite_urls() #1893

Merged
merged 32 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
1ef710f
Data liberation: Kickoff the project
adamziel Oct 11, 2024
234a8bf
Port the URL rewriters from adamziel/site-transfer-protocol
adamziel Oct 13, 2024
819febd
Port WP_HTML_Processor et al. from WordPress
adamziel Oct 13, 2024
0a6167b
Move WordPress core files
adamziel Oct 13, 2024
826fe75
Outline the next steps
adamziel Oct 13, 2024
0633e6f
Add PHPCS and CBF
adamziel Oct 14, 2024
4406fcf
Update HTML API, fix unit tests
adamziel Oct 15, 2024
0cfd334
Merge branch 'trunk' into data-liberation-bring-in-php-parsers
adamziel Oct 15, 2024
b90a9d6
Bump CI PHP version to 8.1
adamziel Oct 15, 2024
081535b
Adjust the CI setup for PHP
adamziel Oct 15, 2024
aca88fe
Run npm instlal insteaf of installing just nx
adamziel Oct 15, 2024
897af50
Use the correct nx project name
adamziel Oct 15, 2024
f7679b0
Remove the network functions and only lint the src directory
adamziel Oct 15, 2024
5b9ec7d
Remove special casing for direct matching pathname prefixes
adamziel Oct 15, 2024
97fed71
Fix linting errors
adamziel Oct 15, 2024
96c1ce4
Move the additional functions to pbpcbf.php
adamziel Oct 15, 2024
e15408a
Replace iterate_urls with url_matches
adamziel Oct 15, 2024
b788eea
Lint PHP
adamziel Oct 15, 2024
b83933c
Thoroughly test WP_URL_In_Text_Processor
adamziel Oct 28, 2024
fb0204c
Enable tests for WP_Block_Markup_Processor
adamziel Oct 28, 2024
b1ea8dc
Enable all PHPUnit tests
adamziel Oct 28, 2024
4335044
Enable URLParserWHATWGComplianceTests
adamziel Oct 28, 2024
91863ca
move $is_relative declaration clsoer to where it's used
adamziel Oct 28, 2024
d2aeea4
Add a single tricky test case for wp_rewrite_urls()
adamziel Oct 28, 2024
60db1e1
Preserve urlencoded data in the rewritten path
adamziel Oct 28, 2024
2da0386
Unit test urldecoding UTF-8 data
adamziel Oct 28, 2024
54bea02
Lint
adamziel Oct 28, 2024
54c901d
Remove messing with private WP_HTML_Tag_Processor attributes
adamziel Oct 28, 2024
a62532b
Remove the commented out dead code from WP_URL_In_Text_Processor
adamziel Oct 28, 2024
238decd
Uncomment the public suffix list verification
adamziel Oct 28, 2024
37622ab
PHP 8.1 compat
adamziel Oct 28, 2024
e12190f
PHP 8.1 compliance
adamziel Oct 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,31 @@ jobs:
- uses: ./.github/actions/prepare-playground
- run: npx nx affected --target=lint
- run: npx nx affected --target=typecheck
lint-and-test-php:
name: 'Lint and test PHP'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm install
- name: Set up PHP
uses: shivammathur/setup-php@v2
with:
# @TODO: Running the tests on PHP 7.2
php-version: '8.1'
tools: phpunit-polyfills
- name: Install Composer dependencies
uses: ramsey/composer-install@v3
with:
ignore-cache: 'yes'
composer-options: '--optimize-autoloader'
working-directory: 'packages/playground/data-liberation'
- run: npx nx run playground-data-liberation:lint:php
- run: npx nx run playground-data-liberation:test:phpunit
test-unit-asyncify:
runs-on: ubuntu-latest
needs: [lint-and-typecheck]
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ packages/docs/site/src/model.json
.docusaurus
dist.zip
rollup.d.ts
.phpunit.cache
packages/playground/data-liberation/vendor

# dependencies
node_modules
Expand Down
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
[submodule "isomorphic-git"]
path="isomorphic-git"
url=https://github.com/adamziel/isomorphic-git.git
[submodule "wp-html-api"]
path="wp-html-api"
url=https://github.com/WordPress/wordpress-develop

3 changes: 2 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,6 @@
"C_Cpp.errorSquiggles": "disabled",
"git.branchProtection": [
"trunk"
]
],
"php.version": "7.2"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
<?php
/**
* This script regenerates the public suffix list from the publicsuffix.org website.
*/
adamziel marked this conversation as resolved.
Show resolved Hide resolved

$suffixes = file_get_contents('https://publicsuffix.org/list/public_suffix_list.dat');
$lines = explode("\n", $suffixes);
$tlds = array();
foreach ($lines as $line) {
if ( empty( $line ) || $line[0] === '/' ) {
continue;
}
if ( strpos( $line, '.' ) !== false ) {
continue;
}
$tlds[] = $line;
}


$php_file_path = __DIR__ . '/../src/public_suffix_list.php';

$new_php_file_path = $php_file_path.'.swp';
$fp = fopen($new_php_file_path, 'w');
fwrite($fp, "<?php\n\n");
fwrite($fp, "/**");
fwrite($fp, "\n * Public suffix list for detecting URLs with known domains within text.");
fwrite($fp, "\n * This file is automatically generated by regenerate_public_suffix_list.php.");
fwrite($fp, "\n * Do not edit it directly.");
fwrite($fp, "\n * @TODO: Process wildcards and exceptions, not just raw TLDs.");
fwrite($fp, "\n */\n\n");
fwrite($fp, "return array(\n");
foreach($tlds as $tld) {
fwrite($fp, "\t'".$tld."' => 1,\n");
}

fwrite($fp, ");\n");

if(file_exists($php_file_path)) {
unlink($php_file_path);
}
rename($new_php_file_path, $php_file_path);
98 changes: 98 additions & 0 deletions packages/playground/data-liberation/bin/rewrite-urls.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
<?php

require_once __DIR__ . "/../bootstrap.php";

if ( $argc < 2 ) {
echo "Usage: php script.php <command> --file <input-file> --current-site-url <current site url> --new-site-url <target url>\n";
echo "Commands:\n";
echo " list_urls: List all the URLs found in the input file.\n";
echo " migrate_urls: Migrate all the URLs found in the input file from the current site to the target site.\n";
exit( 1 );
}

$command = $argv[1];
$options = [];

for ( $i = 2; $i < $argc; $i ++ ) {
if ( str_starts_with( $argv[ $i ], '--' ) && isset( $argv[ $i + 1 ] ) ) {
$options[ substr( $argv[ $i ], 2 ) ] = $argv[ $i + 1 ];
$i ++;
}
}

if ( ! isset( $options['file'] ) ) {
echo "The file option is required.\n";
exit( 1 );
}

$inputFile = $options['file'];
if ( ! file_exists( $inputFile ) ) {
echo "The file $inputFile does not exist.\n";
exit( 1 );
}
$block_markup = file_get_contents( $inputFile );

// @TODO: Decide – should the current site URL be always required to
// populate $base_url?
$base_url = $options['current-site-url'] ?? 'https://playground.internal';
$p = new WP_Block_Markup_Url_Processor( $block_markup, $base_url );

switch ( $command ) {
case 'list_urls':
echo "URLs found in the markup:\n\n";
wp_list_urls_in_block_markup( [ 'block_markup' => $block_markup, 'base_url' => $base_url ]);
echo "\n";
break;
case 'migrate_urls':
if ( ! isset( $options['current-site-url'] ) ) {
echo "The --current-site-url option is required for the migrate_urls command.\n";
exit( 1 );
}
if ( ! isset( $options['new-site-url'] ) ) {
echo "The --new-site-url option is required for the migrate_urls command.\n";
exit( 1 );
}

echo "Replacing $base_url with " . $options['new-site-url'] . " in the input.\n\n";
if (!is_dir('./assets')) {
mkdir('./assets/', 0777, true);
}
$result = wp_rewrite_urls( array(
'block_markup' => $block_markup,
'base_url' => $base_url,
'current-site-url' => $options['current-site-url'],
'new-site-url' => $options['new-site-url'],
) );
if(!is_string($result)) {
echo "Error! \n";
print_r($result);
exit( 1 );
}
echo $result;
break;
}

function wp_list_urls_in_block_markup( $options ) {
$block_markup = $options['block_markup'];
$base_url = $options['base_url'] ?? 'https://playground.internal';
$p = new WP_Block_Markup_Url_Processor( $block_markup, $base_url );
while ( $p->next_url() ) {
// Skip empty relative URLs.
if ( ! trim( $p->get_raw_url() ) ) {
continue;
}
echo '* ';
switch ( $p->get_token_type() ) {
case '#tag':
echo 'In <' . $p->get_tag() . '> tag attribute "' . $p->get_inspected_attribute_name() . '": ';
break;
case '#block-comment':
echo 'In a ' . $p->get_block_name() . ' block attribute "' . $p->get_block_attribute_key() . '": ';
break;
case '#text':
echo 'In #text: ';
break;
}
echo $p->get_raw_url() . "\n";
}
}
67 changes: 67 additions & 0 deletions packages/playground/data-liberation/bootstrap.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
<?php

require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-token.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-span.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-text-replacement.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-decoder.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-attribute-token.php";

require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-decoder.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-tag-processor.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-open-elements.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-token-map.php";
require_once __DIR__ . "/src/wordpress-core-html-api/html5-named-character-references.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-active-formatting-elements.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-processor-state.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-unsupported-exception.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-processor.php";

require_once __DIR__ . '/src/WP_Block_Markup_Processor.php';
require_once __DIR__ . '/src/WP_Block_Markup_Url_Processor.php';
require_once __DIR__ . '/src/WP_URL_In_Text_Processor.php';
require_once __DIR__ . '/src/WP_URL.php';
require_once __DIR__ . '/vendor/autoload.php';


// Polyfill WordPress core functions
function _doing_it_wrong() {

}

function __($input) {
return $input;
}

function esc_attr($input) {
return htmlspecialchars($input);
}

function esc_html($input) {
return htmlspecialchars($input);
}

function esc_url($url) {
return htmlspecialchars($url);
}

function wp_kses_uri_attributes() {
return array(
'action',
'archive',
'background',
'cite',
'classid',
'codebase',
'data',
'formaction',
'href',
'icon',
'longdesc',
'manifest',
'poster',
'profile',
'src',
'usemap',
'xmlns',
);
}
44 changes: 44 additions & 0 deletions packages/playground/data-liberation/composer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"name": "wordpress/data-liberation",
"prefer-stable": true,
"require": {
"ext-json": "*",
"php": ">=7.2",
"rowbot/url": "^4.0"
},
"require-dev": {
"yoast/phpunit-polyfills": "2.0.0",
"squizlabs/php_codesniffer": "3.*",
"wp-coding-standards/wpcs": "3.1.0",
"phpcompatibility/php-compatibility": "*"
},
"config": {
"optimize-autoloader": true,
"preferred-install": "dist",
"allow-plugins": {
"dealerdirect/phpcodesniffer-composer-installer": true
}
},
"autoload": {
"classmap": [
"src/"
],
"psr-4": {
"WordPress\\DataLiberation\\": "src/WordPress"
},
"files": [
"src/functions.php"
]
},
"autoload-dev": {
"classmap": [
"tests/"
]
},
"authors": [
{
"name": "WordPress Contributors",
"email": "contributors@wordpress.org"
}
]
}
Loading