-
Notifications
You must be signed in to change notification settings - Fork 0
/
notes-data.txt
208 lines (205 loc) · 13 KB
/
notes-data.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
bH
- entity types:
- endpoint (e.g. android app, pdf file, web site link, rss feed)
- resource (e.g. app, book, video series/podcast)
- site/organization/group
- preprocessor/uniformeralach processor types:
- input types:
- endpoint: a string, or a map with ref key and optional label
- endpoints:
- can be string, output one web endpoint
- can be map, in which case each value in it...
- can be a string, which is output as one endpoint with key type, and a ref with no label
- can be an array, in which each entry is an endpoint
- type (content only) - media type
- can be a string
- can be a list of strings (type, subtype, sub-subtype, vechulu, kind of like mime type)
- title
- can be a string, if so output just main title
- can be a map with main (title) and subtitle
- fee - can be boolean or map
- if map, key called "has_fee" with bool value, and optional description with more detail about fee
- output types:
- languages (content only?) - optional list of language code strings
- site_id - optional id of a site it is created by/part of
- part_of - optional id of another item of which item is a part
- title - human-readable name
- subtitle - optional subtitle, from site
- description (content only?) - describes the content
- main_category: main category of the content
- tags (content only) - categories of the content - list of strings
- target_audience (content only) - optional, list of audience types (strings) that content seems tailored to
- endpoint: a map, with keys ref (the input itself, if it was a string) and label
- endpoints: a map
- keys representing the endpoint type
- values should be lists, with each item an endpoint
- if we want to get only one endpoint and no more, prefer ones without a label, and prefer ones earlier in list to later ones
- TODO: other description refactoring and short_description
- content_type - list of strings - type of the item entry
- types #: multiple types allowed, no hierarchy, tag-like
#- brand - something with a logo, more or less :-)
# - TODO: use brand key instead?
#- org - an organization or group who produces content, has one or more brands (it can also serve as a brand)
- channel - a live (continuous? stream-like thing) with audio or video etc. (not a channel with multiple elements like podcast)
- rename this "station" or "broadcast" or something less ambiguous?
- how does it differ from stream media_type? maybe consolidate - see stream content_type
- series - a group of items with an order (set-like)
- group - like series, but unordered, and can have subzachin(?)
- item - an individual video/audio/text/etc.
- content_type: video, audio, text
- TODO: way to indicate if something is content without specifying content type? (perhaps content_type: true?, or see if any items have a content_type, or have it as related_ids subitem or content)
- stream: how differs from content_type channel? consolidate? see content_type channel
- container_type: how it's accessed
- app
- fee - array with map. earlier items in array take precedence, if we are before "until" date
- has_fee - boolean
- description - optional string
- until - iso 8601 date/time; is time fee field applies until
TODO
- bichlall
- add items in this list to gitlab issue tracker, and vice versa :-)
- consolidate this TODO section with processor types section in this file :-)
- schemas etc.
- make a schema for for unprocessed/input objects too?
- types.yml schema
- additionalProperties no for most zachin?
- endpoints wikidata checking?
- validate endpoint refs with regexes (from wikidata maybe)?
- item info
- languages
- language codes - which system/standard should these follow?
- languages - if same content in multiple languages, do they have more than one content entry? what if same site but different content for each language?
- filter languages?
- fees
- add "from" field to fees? (like "until")
- login
- add marker or field that shows that login is needed for (some or all) content? maybe separate field from fees?
- example: Jem.tv
- endpoints
- add endpoint refs - e.g. Shaar HaBitachon topic (shaar_habitachon) & Shaar HaBitachon text (shaar_habitachon_chayenu_text), (TheWellsprings.org and Gezintahait Street content?)
- broken links
- add section on contact form?
- crawl for broken links?
- check for working https on http uris, announcements, and endpoint formatters etc.?
- (optionally) percent-encode aka urlencode refs?
- if so, correct google podcasts urls
- coalesce similar endpoint types (youtube for example)? if so, what if there are duplicates or endpoints that contain others (e.g. youtube channel that has playlist from endpoint)
- main link:
- use parent item web/main link sometimes? or maybe if indicated specifically?
- if so, what type of related_id(s) are parent items?
- allow is_main_url to be set to false on web links to disable automatically using web link as main link
- item and/or endpoint rss feeds - specify manually? or add to web-type endpoints? or scrape from web sites automatically? or both?
- maybe make generic "url" endpoint that can be customized with tags
- maybe even use this as a framework for endpoints
- third-party endpoint types: add flag instead of removing title?
- third-party -- i.e. content does not necessarily originate from the same ppl as control site
- verify using rel=me links? what if a site doesnt support/use them though?
- can then filter out third-party endpoints, instead of (or in addition to) filtering out everything without a title
- has third_party_content boolean
- rename third_party_content field to be more general, for hiding stuff from web interface?
- technical endpoint types, for example rss feeds: hide from web interface? tag with third_party_content?
- values
- whatsapp group - include link to whatsapp group? if so, under content item or org item etc.?
- add media_types for endpoints? (see media_types)
- add tags for endpoints? (e.g. "content" for web links etc. that you can read a book online in)
- endpoint type tags? (i.e. third_party_content)
- have tags field instead of additional fields?
- maybe have endpoint templates being items themselves?
- things that are already in labels?
- have content_type and media_type for endpoints to cover some of these?
- container endpoints shayechus
- this would be for when we have a list of items that are containers (e.g. apps, podcast) of a specific type, and endpoints links for each e.g. podcast or app (like google play and apple store links for apps)
- have info on endpoint descriptors on which kind of container_types they can be shown for?
- have info on individual endpoints (e.g. tags) to indicate that e.g. this web link has the content at it (e.g. for books where can read it at link)
- endpoint formatter/url-type references to items instead of host related_ids?
- announcements
- add optional title field, to use for announcement link title (often can be title of blog post etc)?
- can pull from html metadata perhaps, maybe make whitelist for article sources which do this?
- also date etc. fields?
- content types
- periodicals
- for series content type, make a data field for how often items in series are released, if applicable? (daily, weekly, monthly, etc.)
- periodical container_type or way to mark periodicals? (e.g. periodical top-level key)
- period - how often it's released (weekly/monthly/irregularly/quarterly/etc.) e.g. moshiach times is several times a yr but maybe not every monthly
- start and end dates and/or ranges? (e.g. mendy and the golem was in 1980s)
- media types
- allow multiple?
- maybe have main media type and media type list, similar to categories?
- if so make processing and rendering zachin handle it
- can remove "more" media type if so
- derivative media types (e.g. audio of video) - mark on its own?
- secondary content e.g. source sheets
- maybe highlight endpoints that have this content i.e. the richest content? (usually the original)
- examples
- how handle something like chabad.org podcasts, where the same content has two video podcasts (one high quality one low) and an audio podcast?
- how handle something like soulwords zachin where there is the video, and then podcasts and mp3s with the audio?
- aleph bais gimmel shiurim that have both audio and video
- have endpoint media_types? (see endpoints)
- automatically derive media_type and/or content_type from container_type=podcast? or maybe it might also have video.
- add "portal" media type or something for web sites?
- source_{description,subtitle}
- distinguish between descriptions and subtitles by the site itself, and descriptions by other peoplesalach
- add remaining keys (description etc. at present) to item info pages
- perhaps use prefixes like site_ or author_ etc for their keys
- pull from sources?
- related_ids
- add "subsidiary" type or something like that (instead of more general and sometimes inaccurate "brand")
- brand related_id is hidden right now
- verification that references to types in type definitions exist (may be able to do this by templating the schema, and inserting the type values)
- automatically generate item_types key? (with values/tags like content, brand, app, container/access_method
- automatically derive media_type and/or content_type from container_type=podcast? or maybe it might also have video.
- add "portal" media type or something for web sites?
- source_{description,subtitle}
- distinguish between descriptions and subtitles by the site itself, and descriptions by other peoplesalach
- add remaining keys (description etc. at present) to item info pages
- perhaps use prefixes like site_ or author_ etc for their keys
- add "subsidiary" type or something like that (instead of more general and sometimes inaccurate "brand")
- container types
- new ones, like album, book?
- call it something more descriptive than container_type?
- verification that references to types in type definitions exist (may be able to do this by templating the schema, and inserting the type values)
- automatically generate item_types key? (with values/tags like content, brand, app, container/access_method
- hierarchies
- make able to generate hierarchy from related_ids
- also can copy relevant data from those items e.g. target audience, categories, etc.
- reverse related_ids relations calculated and displayed?
- multi-part items (multiple pieces that form a "item" content-type item)
- endpoint type? or related_ids type?
- make better alternative to "more" categories and media_types?
- make category related_ids type instead of categories?
- consolidate stream/channel media and content types?
- add indicator of whether the contents of an endpoint are full (has all episodes etc. in series) or non-full?
- see content_type periodical/series notes and endpoint notes
- add marker for whether something (e.g. periodical) can be subscribed (to) by email, and if so, where (the web page or etc.) to sign up to receive emails (if there is one)?
- podcasts to email - https://www.subscribebyemail.com/soulwords.org/feed/podcast/
- chabad.org switch to video/audio? https://www.chabad.org/multimedia/audio_cdo/aid/3949917/jewish/7-Adar-and-Moshe-Rabbeinu.htm https://www.chabad.org/multimedia/video_cdo/aid/1794748/jewish/7-Adar-and-Moshe-Rabbeinu.htm
- compilation
- when eleventy is in watch mode, have it delete old _site in between?
- and also have it rerun on removed files
- maybe more fields autogenerated/extracted from item sources metadata
- podcast / rss feed image and description and date and more metadata?
- twitter/facebook image and description fields that are used by those sites (and can be used by others too) to make a box with info and image
- chabad.org itemprops
- meta info - icon from webapp, description, title, apple-itunes-app tag, etc
- opensearch? (chabad.org pages have)
- musicbrainz/youtube-playlist-rss/wikidata/other endpoints zachin info
- favicon
- json-ld
- breadcrumbs list for related_ids?
- @type for related_id types?
- itunes-app
- htmllint vechulu?
- templates
- are eleventy excerpts suppposed to be in data.page.excerpt?
- use "log" filter to debug?
- rendering etc.
- media type icons - maybe from bootstrap icons
- icons by item badges
- microdata on item pages?
- blog posts - english dates too?
- item links and more macros/functions in templating vechulu
- eleventyConfig.addFilter("item_id_to_info_link", function(value) { return `items/${this.slug(value)}`; });
- or shortcode?
- sitemap?
- add previous page and next page links to individual blog posts?
- add language info for items and endpoints