Skip to content

Latest commit

 

History

History
2729 lines (2140 loc) · 149 KB

bookmarks.md

File metadata and controls

2729 lines (2140 loc) · 149 KB

Table of Contents

  1. Bookmarks
    1. Misc
    2. Business
      1. Misc
      2. Steuern / Finanzamt / Einstufung Selbstständigkeit
    3. Personal Development
    4. Gesellschaftliches
    5. Software-Development:Software_Development:
      1. Misc
      2. Lessons from 50 years experience (Project-Management etc)…
      3. Essays on Programming
      4. Some Collection of Best of Talks
      5. Project Management
      6. Team Management
      7. Software Architectures / Diagrams
      8. JSON
      9. YAML
      10. RegEx
      11. Python
      12. Django
      13. Javascript / CSS
      14. Java
      15. REST APIs / Web Development / HTML
      16. HTML
      17. Golang
      18. Git / GitHub / Versioning
      19. Shell/Bash/Zsh
      20. Makefiles
      21. Databases
      22. Cronjobs
      23. Editors
    6. DevOps / Security
      1. Misc
      2. Logging
      3. Dashboarding
      4. Monorepo vs Multirepo
      5. CI/CD-Pipeline
      6. Cloud Provisioners
      7. Site Reliability
      8. Microservices / Serverless
      9. Docker/Containers:Docker:
      10. Kubernetes:Kubernetes:
      11. SSH
      12. General Computer/Networking/… Security:Security:
      13. Security Tools
    7. Data Science / ML / NLP:DataScience:
      1. Misc
      2. Compilations
      3. Interesting Analysis:Analysis:
      4. Kaggle:Kaggle:
      5. Team (Management, Hiring, Organization, …)
      6. Optimizers / Learn Rates
      7. CLI introspection (Visidata/xsv/…)
      8. Jupyter Notebooks
      9. Google Colab
      10. Streamlit
      11. Pytorch/fastai
      12. Web Scraping
      13. Datasets
      14. Pandas:Pandas:
      15. Data Annotation
      16. Data Cleaning
      17. Data Exploration / Feature Engineering
      18. Data Testing
      19. Data Visualization:Visualization:
      20. NLP:NLP:
      21. Computer Vision
      22. Time Series
      23. Graphs
      24. Semi Supervised Learning
      25. Deployments:DevOps:
    8. Data Engineering
      1. Misc
      2. Spark
      3. Data Drift Detection
    9. Math:Math:
    10. Physics

Bookmarks

Misc

Business

Misc

  • https://clutch.co/app-developers/resources/what-is-riskiest-assumption-test - RAT vs MVP
  • https://entrepreneurshandbook.co/the-real-reasons-why-a-vc-passed-on-your-startup-917c30103ecb - pretty complete check list
  • https://medium.com/swlh/5-more-stupid-things-entrepreneurs-should-never-say-when-fundraising-67121dee1c1b
    • “If we capture X% of the market…”: Instead, marketing sizing estimates show how much money is already being spent, which means capturing market share requires taking customers from an entrenched player. It’s going to cost a lot more time and money than this founder thinks.
    • “Before we start, would you mind signing this NDA?”: This founder thinks ideas are more valuable than execution. … care about whether or not the founder can create a highly scalable business around it.
    • “Our team is led by technical founders”: instead of validating the market opportunity to check for actual demand, they immediately started building it. That’s not going to end well. … learn that entrepreneurship isn’t like Field of Dreams, and “If you build it, they will come” doesn’t work
    • “We’re not just co-founders. We’re also best friends.”: instead of identifying an important market opportunity, researching it rigorously, and then building the best team possible to capture it, they sat around for hours trying to come up with a “great idea.” … they’re building a company and they think it’s going well because a few of their friends/family/frat brothers agreed to download the beta version of their app
    • “We’re growing organically!”: founder doesn’t realize that, for venture-backed companies, organic growth is functionally equivalent to magical growth. Yes, it looks cool on a chart, but it doesn’t represent anything compelling about the underlying fundamentals of the business … founder doesn’t realize that, for venture-backed companies, organic growth is functionally equivalent to magical growth. Yes, it looks cool on a chart, but it doesn’t represent anything compelling about the underlying fundamentals of the business
  • https://medium.com/@aarondinin/the-two-letter-word-that-will-destroy-your-company-1e66a71b5349
    • If we can close X% of the leads we’ve got coming into the pipeline… If only X% of our website visitors click… If we can just get picked up by a couple of media outlets… If just one of our videos goes viral… If we get just one investor to commit…
    • Even in the few examples I’ve given, I hope you’ve noticed how easy and dangerous it is to make assumptions about the outcomes of difficult work and then cover it with the word “if.”
    • obfuscate tasks that are enormously complex, difficult, time-consuming, expensive, and perhaps impossible
    • replacing my original “if-statement” with the following phrase: “If we can do something that takes 12 months and costs us a million dollars.”
  • https://medium.com/@aarondinin/10-things-a-vc-would-rather-do-with-your-nda-than-sign-it-9f7cff9175ee
    • Investors meet with dozens of companies each week, and many of those companies have similarities based around the investor’s investment thesis. By signing your NDA, they could inadvertently put themselves in a position of not being able to discuss something necessary for considering another investment.
    • A venture capitalist has no interest in “stealing” your idea or proprietary information because venture capitalists don’t build companies. That’s not their business model. Their business model is to invest in companies, and that’s a full time job. They don’t have time to do whatever work is involved in building the company you’re pitching.
    • The job of a venture-backed entrepreneur is to grow a company to the point at which it can exit and generate a return on the VC’s investment.
    • investors don’t want their entrepreneurs wasting time trying to figure out new and proprietary ways of building and growing companies
  • https://entrepreneurshandbook.co/when-startups-fail-its-usually-because-founders-overlook-these-3-obvious-things-24aa05d05d40
    1. If you don’t have customers, you don’t have a business What matters is getting people to buy your product.
    2. People who don’t know about your product can’t buy it if nobody knows about it, nobody can buy it. If you’re not good at marketing, nothing else about your business can succeed. You need to obsess about that above everything else.
    3. Successful businesses solve people’s problems building amazing products isn’t what successful entrepreneurs do. Instead, having an amazing product is the outcome of an entrepreneur’s hard work. entrepreneurship isn’t about building things. Entrepreneurship is about solving people’s problems
  • https://medium.com/swlh/why-startups-should-never-hire-straight-a-students-75b176b8e907
  • https://medium.com/young-coder/how-microsoft-beat-the-innovators-dilemma-5b78e3692ed3

Steuern / Finanzamt / Einstufung Selbstständigkeit

Personal Development

Gesellschaftliches

Software-Development :Software_Development:

Misc

Lessons from 50 years experience (Project-Management etc)…

  • https://medium.com/@karlwiegers/growing-a-culture-of-software-quality-eb39a090e76b - Interview about code reviews Excerpt:

    • One obvious indicator is a lack of customer satisfaction. But you don’t want to wait until after delivery to discover quality problems. That’s one advantage of agile approaches. Some working software is delivered periodically so you can begin collecting that feedback and make appropriate course corrections.
    • Few organizations measure how much of their total effort is spent on rework, both during development and post-delivery. If you do measure that, you could get a pretty scary number.
    • In a healthy software engineering culture, quality is a priority for all team members and managers. One cultural principle of a group I led was that we prefer to have a peer, rather than a customer, find a defect.
    • “You can pay me now, or you can pay me a lot more later.”
    • The best software engineer I ever knew got nervous if he couldn’t find people to review his code.
    • I would never want to work in an organization in which peer reviews were not a standard part of the culture.
    • Invite people to review your work early and often, formally and informally.
    • If someone reviews 1000 lines of your code and suggests some better approaches, you’re probably not going to go back and incorporate all those changes.
    • The other reason for reviewing before you think you’re done is psychological. When you think something is finished, you really don’t want someone to tell you that it’s not. You can have a lot of psychological resistance to review input at that point, because you’re ready to move on to the next task. It’s easy to push back against any suggestions for changes. This is not a constructive attitude toward peer reviews or a good use of a reviewer’s time.
    • If someone walks out of a review feeling beat up and swears that they’re never going to go through that again, that’s definitely not a sign of a good review process in a healthy culture.
    • Each of us must reach a point where not only are we comfortable soliciting input on our work, but we actually become uncomfortable if we haven’t had others examine what we’ve created before we inflict it on an unsuspecting world.
  • https://medium.com/swlh/building-a-healthy-software-engineering-culture-59183b93389d Excerpt:

    • Quality is the top priority; long-term productivity is a natural consequence of high quality.
    • But discussing just what principles, values, and attitudes are important will help align the team members so they can make decisions and take actions that are consistent with that shared philosophy.
    • Of course, culture evolves over time. You just hope it doesn’t devolve. I’ve seen that happen too, like when a new manager came in to take over my group after I stepped down as the manager. He didn’t share our commitment to a quality-driven culture and continuous improvement, and some of what we had achieved gradually eroded away. That was discouraging.
    • Suppose a manager claims that quality is a top priority. But then he doesn’t want to give project teams the time to perform peer reviews, or he penalizes people if bugs are found in their work during a review.
    • Managers — and enthusiastic team members — must recognize that people and organizations can only absorb change at a certain rate.
  • https://medium.com/@karlwiegers/mind-the-crap-gap-61f314fe9678 Excerpt:

    • Hold your hand up in front of you with your thumb and index finger about one inch apart. In many situations, that short distance represents the difference between quality and crap. Most of the time, all it takes to bridge that “crap gap” is to do a little more questioning, listening, thinking, measuring, or testing before delivering the product or declaring the job complete. Ignoring the crap gap can be expensive for the workers and annoying for their victims.
    • A sign in my college chemistry laboratory asked: “If you don’t have time to do it right, when will you have time to do it over?”
    • Okay, but personally, I like to verify correctness before declaring victory.
    • It’s up to management to shape a company culture in which individual employees feel both empowered and expected to do the job well.
    • One good way to handle situations like this is to point out to the provider that the defective work does not appear to be up to their standards.
    • Moreover, when I see something obviously done wrong like this, it makes me wonder how many other problems there were that I just can’t see. I don’t fully trust the provider anymore.
    • Companies that do measure what they spend on rework — both internal and external failure — often are shocked at the numbers. Reducing rework increases your profit; it’s that simple.
  • https://medium.com/swlh/six-estimation-safety-tips-6832b8f8c42a Excerpt:

    1. A goal is not an estimate
      • Commitments should be based on plausible estimates, not just desired targets.
      • work should not be considered overdue if there was never any realistic likelihood of completing it by the dictated target date
    2. The estimate you produce should be unrelated to what you think the requester wants to hear
      • don’t change your estimate simply because someone doesn’t care for it
      • There’s no reason to reduce a thoughtfully crafted estimate simply because someone isn’t happy with it.
      • You can examine assumptions, try different estimation methods, explore risks, or negotiate scope, resources, or quality. But don’t just cave to make someone smile.
    3. The correct answer to any request for an estimate is “Let me get back to you on that.”
      • So before you say, “Sure, no problem,” make sure you know what you’re getting into.
    4. Avoid giving single-point estimates
      • present an estimate as a range instead of a single value. Identify the minimum possible duration (or some other measurable factor) for the work, the most likely or expected value, and the maximum expected duration barring some catastrophic event
    5. Incorporate contingency buffers into estimates
    6. Record actual outcomes and compare them to the estimates
      • if you record what you did today, then tomorrow that is historical data. It’s not more complicated than that
      • In fact, if you don’t do that, then the next time you’re not estimating, you are guessing — again.
  • https://medium.com/swlh/negotiating-achievable-commitments-6575b3d73b20 Excerpt:

    Successful projects — and successful relationships — are based on realistic commitments, not on fantasies and empty promises.

    1. We must make commitments freely
    2. Commitments must be explicitly stated and clearly understood by all parties involved
      • Consider writing a brief summary of each major commitment you exchange with someone else. This confirms the communication and establishes a shared expectation of accountability.
      • I keep two running lists in my daily life: To Do, and Waiting For.

    Negotiating Commitments

    • Separate the people from the problem
    • Focus on interests, not positions
    • Invent options for mutual gain
    • Insist on using objective criteria
      • And remember that an estimate is not the same as a promise.
      • A common cause of commitment failure is making “best case” commitments rather than “expected case” commitments.

    Modifying Commitments

    • If it becomes apparent that you team won’t meet a commitment, tell those affected promptly. Don’t pretend you’re on schedule until it’s too late to make adjustments. Letting someone know early on that you can’t fulfill a commitment builds credibility and respect for your integrity, even if the stakeholders aren’t thrilled that you can’t deliver on the original promise

    Commitment Ethics

    • A meaningful commitment ethic includes the ability to say “no.” e.g.:

      • “Sure, I can do that by Friday. What would you like me to not do instead?”
      • “We can’t get that feature into this iteration and still finish on schedule. Can it wait until the next iteration, or would you rather defer something else?”
      • “I can do that, but it’s not as high on the priority list as my other obligations. Let me suggest someone else who might be able to help you more quickly than I can.”
    • Never make a commitment that you know you can’t keep.

    • our morale will be higher if we’re not set up for certain failure.”

  • https://medium.com/swlh/hearing-the-voice-of-the-customer-the-product-champion-approach-24c61b526131 Excerpt:

    • Only knowledgeable and empowered customer representatives can answer questions and flesh out high-level requirements.
    • My concern about the phrase on-site customer is simply that it is singular.
    • Most products have multiple distinct user classes, who have largely different needs. Certain groups — the favored user classes — will be more important than others to the project’s business success. Sometimes user classes aren’t even people: they’re other information systems or hardware components that derive services from the system you’re building.
    • A more realistic approach is to enlist a small number of product champions to serve as key user representatives.
    • If this group couldn’t all agree on some issue, Don made the call. Someone has to make these kinds of decisions; it’s better if a knowledgeable and respected user rep does it than if the BA or developers choose.
    • They weren’t co-located with the development team, although they were accessible enough to provide quick feedback when needed.
    • Each champion has the time available to do the job.
    • Each champion has the authority to make binding decisions at the user requirements level.
    • The moral of the story is that your customer reps must commit to making the project contributions you need from them, and then they need to do the job.
    • The ideal product champion is an actual member of the user class he or she represents. This isn’t always possible, particularly when building commercial products for a faceless market. You might need to use surrogates in place of real user representatives.
    • When your product champions are former — not current — users, ask yourself whether a disconnect has grown over time between their experiences and the needs today’s users have. Their understanding could be obsolete.
    • Managers sometimes are uncomfortable delegating decision-making authority to ordinary users.
    • First, those managers probably aren’t current members of the user class. Second, busy managers rarely have the time to devote to a serious requirements development effort. It’s better to have managers provide input to the business requirements
    • Software developers who think they can speak for the users. Rarely, this situation can work. More commonly, even developers with considerable domain experience will find that actual users of the new product will bring a different — and more reliable — perspective.
    • Your stakeholders might hesitate to have knowledgeable users spend time working with BAs or through developers on requirements. Here’s how I see it. You’re going to get the customer input eventually. It’s a lot less painful to get it early and on an ongoing basis during development.
    • If your customers won’t collaborate in making sure the product meets their needs, I question their commitment to the project’s success.
  • https://medium.com/swlh/requirements-review-challenges-e3ffe3ad60ef Excerpt:

    • If someone said you could only perform a single quality practice on a software project, what would you choose? I’d pick peer reviews of requirements.
    • Several companies reported that they avoided up to ten hours of labor for every hour they invested in inspecting requirements documents and other software deliverables. Who wouldn’t want to try a technique that might offer a 1,000 percent return on investment?
    • The prospect of thoroughly examining a long requirements document is daunting.
    • Even given a document of moderate size, all reviewers might carefully examine the first part and a few stalwarts will study the middle, but probably no one will look at the last part.
    • perform incremental reviews throughout requirements development
    • large review teams increase the cost of the review, make it hard to schedule meetings, and have difficulty reaching agreement on issues
      • Fourteen people cannot agree to leave a burning room, let alone agree on whether or not a particular requirement is correct.
      • Make sure each participant is there to find defects, not to be educated or to protect a political position.
      • Understand which perspective (such as user, developer, or tester) each inspector represents. (+ send just one representative to the inspection meeting)
      • Establish several small teams to inspect the requirements in parallel and combine their defect lists, removing any duplicates.
      • supply the requirements set to the other interested stakeholders in advance so they have an opportunity to contribute their input
    • don’t let debates in the form of written comments substitute for talking to each other
    • A prerequisite for a formal review meeting is that the participants have examined the material being reviewed ahead of time.
    • In fact, if you’re invited to participate in a requirements review and don’t have adequate time to go over the material in advance on your own, don’t even bother attending the meeting. It’s a waste of everyone’s time.
    • My general rule is: “Review early and often, formally and informally.”

Essays on Programming

  • https://www.benkuhn.net/progessays/

  • https://blog.nelhage.com/post/computers-can-be-understood/

  • https://mcfunley.com/choose-boring-technology This is controversial. There are also many examples where choosing boring technology ended up in terrible technology that needed much earlier maintainance (like choosing PHP in 2010 or the quoted MySQL from the article – most of the time a PostgreSQL ends up with less problems, and you still need to be able to migrate to Spanner/Hive/Spark/CockroachDB if you are successful)

    IMHO, the main problem is the conception of shipping without planned maintainance (development). That’s absurd in every other engineering discipline. When we build a house, car, plane, … - we know from the start when we need to do maintainance and which parts should be replaced after what time of usage (and most of the time, we won’t replace it with outdated technology). Updating the software at least once in a quarter and working on at least some issues (with the benefit of keeping knowledge about the internal processes), should be the minimum and planned by start. But usually in software, after shipping we expect to minimize following costs and call it operations. And then we end up with a whole deprecated stack that somehow works, but nobody really can work on or improve any longer.

  • https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction

    • duplication is far cheaper than the wrong abstraction
    • prefer duplication over the wrong abstraction
    • IMHO, a good hint is coming from Go Best Practices: Don’t write common, util or other generic classes, and if you can’t assign an abstraction to one topic, I think it’s a good sign of a bad abstraction or an abstraction that should be only internally used in a bit fatter package
  • https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ tldr; It’s a huge mess, don’t expect anything (not even unicode)

    In theory, the best would be to implement something like a free data field and then an implementation of how to interpret it and then algorithm on top of it using the most appropriate interface to work with the name. But really, who does that, is there even at least one example for it?

  • https://sockpuppet.org/blog/2015/03/06/the-hiring-post/ Very detailed:

    • but try to make interviewing unimportant (you need good coder skills, not good talking/social stress skills)
    • prefer coding testing from practice
    • but keep objective scoring criteria: like test coverage, algorithmic complexity, spotted problem A, B, …, …
    • if interviewing: keep a warm up phase with unimportant personal questions, keep highly structured interviews [all though they won’t be loved](with robot-like scripts for the interviewer) and make them same/comparable for every one
    • still allow free Q&A, but make it shorter and with less influence on all
    • make it respectful for the interviewed person: free books etc to compensate for the work

    … not sure about what company size the author is talking, for small companies, the main problem is usually to get at least one competent worker, not to select between different highly skilled apprentices

    … in general: I’m personal doubtful about long interviewing procedures with several rounds. There are many studies showing that in the end, they really don’t help. There are 2 reliable proxies: high potential (graduate degree) [problem: isn’t productive from day1] and is working successful for someone else [expensive]. I personal think, it’s in case better to make a quick decision probably relying on something like https://en.wikipedia.org/wiki/Secretary_problem than to overestimate the own scoring procedures of interviewing. One problem of long interviewing is that the best will find another job before the selection process has finished (unless you are the one and only company)

  • https://programmingisterrible.com/post/176657481103/repeat-yourself-do-more-than-one-thing-and

    • Repeat yourself, but don’t repeat other people’s hard work. Repeat yourself: duplicate to find the right abstraction first, then deduplicate to implement it.
    • With “Don’t Repeat Yourself”, some insist that it isn’t about avoiding duplication of code, but about avoiding duplication of functionality or duplication of responsibility. This is more popularly known as the “Single Responsibility Principle”, and it’s just as easily mishandled. (like many boolean flags etc)
    • A given module often gets changed because it is the easiest module to change, rather than the best place for the change to be made. In the end, what defines a module is what pieces of the system it will never responsible for, rather what it is currently responsible for.
    • In the end, we call our good decisions ‘clean code’ and our bad decisions ‘technical debt’, despite following the same rules and practices to get there.
  • https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/

    • All non-trivial abstractions, to some degree, are leaky.
      • iterating over a large two-dimensional array can have radically different performance if you do it horizontally rather than vertically,
      • But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify “where a=b and b=c and a=c”
      • network libraries like NFS and SMB let you treat files on remote machines “as if” they were local, sometimes the connection becomes very slow or goes down, and the file stops acting like it was local, and as a programmer you have to write code to deal with this.
      • C++ string classes are supposed to let you pretend that strings are first-class data. They try to abstract away the fact that strings are hard and let you act as if they were as easy as integers. Almost all C++ string classes overload the + operator so you can write s + “bar” to concatenate. But you know what? No matter how hard they try, there is no C++ string class on Earth that will let you type “foo” + “bar”, because string literals in C++ are always char*’s, never strings.
      • And you can’t drive as fast when it’s raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it’s raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can’t see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away
    • So the abstractions save us time working, but they don’t save us time learning.
    • And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.
  • https://blog.nelhage.com/post/reflections-on-performance/

    • Performance — in particular, being notably fast — is a feature in and of its own right, which fundamentally alters how a tool is used and perceived.
    • Fast tools don’t just allow users to accomplish tasks faster; they allow users to accomplish entirely new types of tasks, in entirely new ways.
    • “performance last” model will rarely, if ever, produce truly fast software
    • The basic architecture of a system — the high-level structure, dataflow and organization — often has profound implications for performance.
    • the more 1% regressions you can avoid in the first place, the easier this work is.
    • attempts to add performance to a slow system often add complexity, in the form of complex caching, distributed systems, or additional bookkeeping for fine-grained incremental recomputation
    • tool is fast in the first place, these additional layers may be unnecessary to achieve acceptable overall performance, resulting in a system that is in net much simpler for a given level of performance
  • https://web.archive.org/web/20220418020617/https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/

    • Distributed systems are different because they fail often
    • Writing robust distributed systems costs more than writing robust single-machine systems
    • Robust, open source distributed systems are much less common than robust, single-machine systems
    • Coordination is very hard
    • If you can fit your problem in memory, it’s probably trivial
    • “It’s slow” is the hardest problem you’ll ever debug
    • Implement backpressure throughout your system
    • Find ways to be partially available
    • Metrics are the only way to get your job done
    • Use percentiles, not averages
    • Learn to estimate your capacity
    • Feature flags are how infrastructure is rolled out
    • Choose id spaces wisely
    • Exploit data-locality
    • Writing cached data back to persistent storage is bad
    • Computers can do more than you think they can
    • Use the CAP theorem to critique systems
    • Extract services
  • https://www.stilldrinking.org/programming-sucks - just epic and an exact description of programming world

Some Collection of Best of Talks

Nowhere complete, nowhere top selected per se, but some are interesting, so let’s keep a little list of them

Project Management

  Understand   Not Understand
Aware Known-Knowns   Known-Unknowns
  I’m aware of potential problems   I’m aware of potential problems
  and I understand how to solve them <= Work <= but I don’t know how to solve or avoid them
  ^   ^
  Work   Research
  ^   ^
Not Aware Unknown-Knowns   Unknown-Unknowns
  There are problems out there I’m not aware of   There are problems out there I haven’t even considered
  but I’ve got a good idea of how to approach new problems <= Training <= and I don’t know what I’ll do when I encounter them
  • https://medium.com/analysts-corner/telepathy-and-clairvoyance-requirements-practices-that-dont-work-8945e8a02979

  • https://itnext.io/why-creating-software-is-always-harder-than-expected-14d241f70656

  • https://blog.devgenius.io/rule-one-on-page-one-of-the-book-of-software-development-is-9b9cb1e75ce3 The first rule on page one of the book of software development is to never underestimate the complexity and difficulty of creating software. Almost every aspect of creating software seems simple but turns out to be more complex than anyone imagines.

    Nothing in software development is simple and anyone who says it is either doesn’t know what they are doing or is an idiot. The best approach is to assume everything is complex until you have proven and clarified it’s simple.

    Assume

    • mistakes will happen
    • things will go wrong
    • requirement will change and more will be discovered
    • people will leave
    • no one knows the Software required
    • the wing software will be made and changed
  • https://blog.devgenius.io/software-development-is-a-marathon-run-in-sprints-d61eed6cb784 - You cannot sprint a marathon Software projects have the expectations of Stephen King (6 pages a day and 360 pages per month) but turn out to be more like George R.R. Martin (Game of Thrones) a book every 5 years and his last book was released in 2012.

    Software development is a long-term process that usually takes months, split into 2 week sprints with many made up deadlines.

    Agile software development creates small deadlines, daily updates and constant deadlines. This diverts time, resources and attention to reporting on progress, instead of making progress.

    These regular deadlines are like someone sprinting at the end of every mile in a marathon. Instead of a steady pace, like most marathon runners do, people get out of breath by sprinting.

    Software development needs a steady pace.

    The deadlines make development teams rush, force development and trade speed for quality. This causes mistakes, lower quality and tires the development team out.

    Anytime you are building something, its common to have underestimated the cost, difficultly and time it will take.

    Software development is a slow process that is more complex than everyone assumes.

    People know it’s difficult but assume their project will be different, not for a valid reason but just because they are on it.

    Steady progress is the process that creates better software and doesn’t break developers.

Team Management

Software Architectures / Diagrams

  • https://c4model.com/ - “abstraction-first” approach to diagramming software architecture Software System as:
    • (System) Context
    • Containers (applications and data stores)
    • Components (not seperably deployable units )
    • Code (elements like names of classes/interfaces/etc), probably generated by IDE
  • https://github.com/structurizr/dsl - way to create Structurizr software architecture models based upon the C4 model using a textual domain specific language (DSL)
  • https://github.com/structurizr/cli - command line utility for Structurizr, designed to be used in conjunction with the Structurizr DSL, and supports the following commands/functionality:
    • push content to the Structurizr cloud service/on-premises installation
    • pull workspace content as JSON
    • lock a workspace
    • unlock a workspace
    • export diagrams to PlantUML, Mermaid, WebSequenceDiagrams, DOT, and Ilograph; or a DSL workspace to JSON
    • list elements within a workspace
    • validate a JSON/DSL workspace definition
  • https://github.com/mingrammer/diagrams - Diagram as Code for prototyping cloud system architectures
  • https://draw.io == https://app.diagrams.net/ - open-source, cross-platform diagramming tool

JSON

YAML

RegEx

  • https://regex101.com/ - can help you build and test RegExes, as well as break them down and identify its individual parts
  • https://regex-vis.com/ - generates a graph from a RegEx which is very helpful for understanding what the expression actually does

Python

  1. Best Practices

  2. Python Internals

  3. Modules

    1. Misc

    2. String/Text utils

    3. CLI

      • https://github.com/tiangolo/typer - build great CLIs. Easy to code. Based on Python type hints.

      • https://typer.tiangolo.com/ - fastapi equivalent for CLI tools -> use it for the next CLI tool if possible

      • https://github.com/onelivesleft/PrettyErrors - readable stack traces for terminals with colors

      • https://github.com/Delgan/loguru - Python logging made (stupidly) simple Loguru is a library which aims to bring enjoyable logging in Python.

        Did you ever feel lazy about configuring a logger and used print() instead?… I did, yet logging is fundamental to every application and eases the process of debugging. Using Loguru you have no excuse not to use logging from the start, this is as simple as from loguru import logger.

        Also, this library is intended to make Python logging less painful by adding a bunch of useful functionalities that solve caveats of the standard loggers. Using logs in your application should be an automatism, Loguru tries to make it both pleasant and powerful.

      • https://github.com/willmcgugan/rich - Python library for rich text and beautiful formatting in the terminal

    4. Code Quality/CI

    5. Configuration / Environments

    6. Typing

    7. Data Science

    8. Databases

    9. Diagrams / QR-Codes

  4. Debugging / Profilíng

  5. Documentation

Django

Javascript / CSS

Java

REST APIs / Web Development / HTML

  1. Misc

  2. Web Application testing

HTML

  • https://itnext.io/html-underrated-tags-119ef3e45b94
    • picture: to have alternative imgs depending on media without all the css/js mash
    • progress: progressbars just in plain html
    • base: don’t forget it :-)
    • input type=“…”: we have plain html date, datetime-local, month, week, time, color, range input types
    • details: includes and only clicking on details shows all, again pure html
    • mark: use it instead of to mark something
    • abbr: easy to forget, but very useful
    • div contenteditable: to create an editable field (and get rid of textarea)
  • https://learntheweb.courses/topics/html-semantics-cheat-sheet/
  • https://javascript.plainenglish.io/9-html-tips-nobody-is-talking-about-2022-edition-b7c095029030
    • Fallback image:

    • Directly call from HTML: 123-456-7890

    • Translate:

      Don’t translate this!

      This can be translated to any language.

    • Poster: attribute helps you set an image to be shown while the video is downloading

      <video controls poster="/images/w3html5.gif">
          <source src="movie.mp4" type="video/mp4">
          <source src="movie.ogg" type="video/ogg">
          Your browser does not support the video tag.
      </video>
      

Golang

Git / GitHub / Versioning

  1. Misc

  2. Best Practices / Linters

  3. Config + Tools

Shell/Bash/Zsh

Makefiles

Databases

A list with items where you need a skilled DBA to understand, but written good enough to understand to use a managed service where ever possible :-)

Cronjobs

Editors

  1. https://elsewebdevelopment.com/neovim-vs-helix-which-is-the-best-vi-vim-style-modal-editor/

DevOps / Security

Misc

Logging

Dashboarding

It is there to prove that the data are easily accessible, comparable, and trackable. Only once that is done can they be actionable.

Trapped data is useless data.

Monorepo vs Multirepo

CI/CD-Pipeline

Cloud Provisioners

  1. For all platforms

  2. AWS

  3. GCP

  4. Openshift

  5. Terraform

    1. Misc

    2. Tools

    3. Linters / Code quality

  6. Ansible & Co

Site Reliability

  1. Misc

    including:

  2. Incident Management / Alerting

  3. Deployment Strategies

Microservices / Serverless

Read Details about in https://towardsdatascience.com/api-as-a-product-how-to-sell-your-work-when-all-you-know-is-a-back-end-bd78b1449119

Docker/Containers :Docker:

  1. Misc

  2. Linters

      **Dockle** **Hadolint** **Docker Bench for Security** **Clair** **Anchore** **Trivy**
    **Target** Image Dockerfile Host Image    
          Docker Daemon      
          Image      
          Container Runtime      
    **How to run** Binary Binary ShellScript Binary    
    **Dependency** No No Some dependencies No    
    **CI Suitable** Yes Yes No No    
    **Purpose** Security Audit Dockerfile Lint Security Audit      
      Dockerfile Lint   Dockerfile Lint Scan Vulnerabilities    
  3. Docker Best Practices & Docker Security

  4. Tools

Kubernetes :Kubernetes:

  1. Misc

  2. Yes/No

  3. Linters

  4. Kubernetes Security :Security:

  5. AWS/EKS

    1. Misc

    2. EKS / Deployment

    3. CI/CD

  6. HowTo-Guides (Kubeconfig, GCP, Rancher, KIND, Private Container Registry)

  7. Monitoring

  8. Service Meshs

SSH

General Computer/Networking/… Security :Security:

  1. Misc

  2. Anti-Patterns

    • https://www.ncsc.gov.uk/whitepaper/security-architecture-anti-patterns

      1. ’Browse-up’ for administration
        • When administration of a system is performed from a device which is less trusted than the system being administered. ​
        • A better approach: ‘browse-down’
      2. Management bypass
        • When layered defences in a network data plane can be short-cut via the management plane.​
        • A better approach: layered defences in management planes
      3. Back-to-back firewalls
        • When the same controls are implemented by two firewalls in series, sometimes from different manufacturers.
        • A better approach: do it once, and do it well
        • The one exception: There is one example of using two firewalls back-to-back that makes more sense; to act as a contract enforcement point between two entities that are connecting to each other.
      4. Building an ‘on-prem’ solution in the cloud
        • When you build - in the public cloud - the solution you would have built in your own data centres.
        • A better approach: use higher order functions
      5. Uncontrolled and unobserved third party access
        • When a third party has unfettered remote access for administrative or operational purposes, without any constraints or monitoring in place.
        • A better approach: a good contract, constrained access and a thorough audit trail
      6. The un-patchable system
        • When a system cannot be patched due to it needing to remain operational 24/7.
        • A better approach: design for ’easy’ maintenance, little and often
    • https://www.ncsc.gov.uk/blog-post/protect-your-management-interfaces

      1. Protecting devices used for administration
        • Ensure privileged users carry out their administrative duties in a ‘clean’ (more trusted) environment.
        • Ensure privileged users handle their email and web browsing in a separate ‘dirty’ (less trusted) environment.
        • Consider the ‘dirty’ environment to be sacrificial, and design it in a way that anticipates compromise. When it is compromised, you’d like to be able to find out when and how (and be able to easily recover it into a good state), but the breach shouldn’t have a big impact on your important systems.
        • Use strong authentication mechanisms, such as 2-factor authentication.
      2. Reducing the exposure of management interfaces
        • Expose management interfaces to dedicated management networks where you can. At the very least, limit authorised inbound IP addresses to those used by dedicated management devices.
        • Deploy jump servers where you need to expose management interfaces to less trusted networks. Ensure these are very well configured and maintained.
        • Use only the latest versions of secure protocols and configure them to use strong authentication mechanisms. For example, use the latest version of SSH rather than Telnet, and use public-key authentication to secure access.
        • Create similar tiers in your management networks to those in the systems being managed.
        • Collect and automatically alert on security-relevant events against your management infrastructure.
      3. Ensuring there’s a trail of breadcrumbs
        • Record the commands issued by users on jump servers, and store them securely.
        • Ensure all network and server infrastructure audit records are also kept securely.
        • Send these records to a service that administrators don’t have readily available access to, and would need multiple people to modify.
        • Automate the analysis of logs to identify suspicious behaviour.
    • https://www.ncsc.gov.uk/blog-post/debunking-cloud-security-myths

      On balance we think well-engineered SaaS is better for security than the alternatives.

      Consider whether your IT security engineering team is going to be better or worse at security management for a major commodity product, offered - as a service - by the major vendor who developed it.

      SaaS offerings may feel at times like an uncontrolled and uncontrollable space where your staff will share private data in an unconstrained fashion. Our experience is that this can be true, but that it’s better to provide them with easy to understand guidance on which tools are appropriate to use, and where to seek help, rather than to ban them altogether.

      I assert it is better to spend our local security effort on problems unique to our organisations, rather than worrying about patching, maintaining, and monitoring services that others can do better than us.

      In summary, I would like to leave you with the message that whilst SaaS is not a silver bullet for cyber security, in many situations the security benefits outweigh the risks.

    • https://withblue.ink/2020/04/08/stop-writing-your-own-user-authentication-code.html

    • https://medium.com/@joelgsamuel/ip-address-access-control-lists-are-not-as-great-as-you-think-they-are-4176b7d68f20

  3. How-To-Guides

Security Tools

Data Science / ML / NLP :DataScience:

Misc

Compilations

Interesting Analysis :Analysis:

Kaggle :Kaggle:

Team (Management, Hiring, Organization, …)

Optimizers / Learn Rates

CLI introspection (Visidata/xsv/…)

Jupyter Notebooks

Google Colab

Streamlit

Pytorch/fastai

Web Scraping

Datasets

  • https://github.com/awesomedata/awesome-public-datasets - repository on GitHub of high quality topic-centric public data sources. They are collected and tidied from blogs, answers, and user responses. Almost all of these are free with a few exceptions here and there

  • https://tinyletter.com/data-is-plural - weekly newsletter of useful/curious datasets. Y you can find a huge archive of datasets on their google doc. Just hit ctrl + f for a topic you’d like to look into and see the dozens of results that pop up.

  • https://data.world/datasets/open-data - Data World is an open data repository containing data contributed by thousands of users and organizations all across the world. it contains really hard to find data from. In particular, the healthcare field is one of the more difficult industries to get publicly available data from(due to privacy concerns). But luckily, Data World has 3667 free health datasets you can use for your next project.

  • https://archive.ics.uci.edu/ml/index.php - UCI Machine Learning Repository is a collection of databases, domain theories, and data generators used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited “papers” in all of computer science.

  • https://www.data.gov/

  • https://github.com/neutraltone/awesome-stock-resources - A curated list of awesome stock photography, video and illustration websites.

  • https://datasetsearch.research.google.com

  • https://www.europeandataportal.eu/de/homepage

  • https://tfhub.dev/ - Pretrained Models from Google & DeepMind

    • Text (Embeddings)
    • Image (Classification, Feature Vector, Generator, Other)
    • Video (Classification)
  • https://archive.org/details/GeneralIndex - gigantic index of the words and short phrases contained in more than 100 million journal articles — including many paywalled papers see also: https://www.nature.com/articles/d41586-021-02895-8

  • https://www.si.edu/openaccess - enthält 2D- und 3D-Darstellungen von kulturellen, wissenschaftlichen, historischen, künstlerischen, technischen und Design-Exponaten aus seinen 19 Museen, 9 Forschungszentren, Bibliotheken, Archiven und dem National Zoo. Hinzu kommen Forschungsdaten und Daten zu Sammlungen

Pandas :Pandas:

  1. Misc

  2. Time Series

  3. Performance

  4. Big Data Alternatives

Data Annotation

Data Cleaning

Data Exploration / Feature Engineering

Data Testing

  • https://great-expectations.readthedocs.io/en/latest/index.html - helps teams save time and promote analytic integrity by offering pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality.
  • https://towardsdatascience.com/validate-your-pandas-dataframe-with-pandera-2995910e564 - pandera provides a flexible and expressive API for performing data validation on dataframes to make data processing pipelines more readable and robust.
  • https://towardsdatascience.com/data-drift-it-can-come-at-you-from-anywhere-b78eb186855 - Nice presentation about all the already very ugly data shifts
    • Level shift: Obviously
    • Variance shift: Subtler
    • Variance decrease: That’s a drift too
    • Peak shifts inside a period: The relative position of that peak (within a period) shifted
    • some simple spectral analysis will indicate a shift. So, you have to bring in some knowledge of signal processing to catch this data drift
    • Contextual data — another way to catch the drift
    • Phase-shift/delay: For ML models training and inferencing on time-series data, a slight phase delay can generate totally wrong predictions. Basically, the model was trained like “If X1 and X2 were similar then predict Y0, otherwise predict Y1”
    • Some are drift, some not: industrial or manufacturing scenarios, process recipes and settings change all the time
    • Measurement/Sensor drift: The input data streams and the generative processes may be fine but there may be drift on the sensor

Data Visualization :Visualization:

NLP :NLP:

  1. Misc

  2. Search

  3. Summarisation

  4. Chatbots

  5. Data Augmentation :Augmentations:

  6. Tools

    In all use cases, the information extraction is based on analysing the semantic relationships expressed by the component parts of each sentence:

  7. Embeddings

  8. Metrics

  9. Attention / Transformers / …

  10. BERT

  11. XLNet

  12. GPT

Computer Vision

  1. Misc

    than looking to tiny patches and sum up sum propabilities that’s why shuffling pictures is robust for resnet but also why they are so sensitive to adversial networks

  2. Tools

  3. Architectures

  4. U-Nets / Colorizing / Super-Resolution

  5. Bounding Boxes

Time Series

Graphs

Semi Supervised Learning

Deployments :DevOps:

Data Engineering

Misc

Spark

Data Drift Detection

Math :Math:

Physics