Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree-building: port #835

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

guygastineau
Copy link
Contributor

@guygastineau guygastineau commented Aug 28, 2019

This doesn't point to a canonical-data.json file, because none exists: problem-specifications

I stole all of the tests from the go track. I think it would be good to have tests that verify it can handle more levels of recursion. go tests

I think I probably should have used Either instead of Maybe, but now we can have a discussion about what would be best.

This is a refactoring exercise. I might have made the gross version too gross, but it definitely can be refactored 🤣

I think that my correct solution could probably do with some refactoring too ;)

PS. I meant to send this up here last week, but I've been distracted by life itself. It feels good to get moving on this again ;)

@guygastineau
Copy link
Contributor Author

So, I think the gross solution that is supposed to be shipped with the exercise for the student to refactor got too many linting suggestions and made the Travis build fail.

@guygastineau
Copy link
Contributor Author

Actually a bunch of the linting warnings came from my clean solution too. I'll work on those.

@sshine do you know if that is why it failed?

@guygastineau
Copy link
Contributor Author

Okay, so I got rid of the linting errors for the solution in examples/success-standard/src, but we should be expecting warnings from the sub-optimal solution given to the students to refactor.

This appears to be why it is failing unless I am misreading the report from Travis. Is there someway to exempt files that are for refactoring exercises?

@guygastineau
Copy link
Contributor Author

I see that:

 hlint ${TRAVIS_BUILD_DIR} # Run `hlint` on the entire repository.

returns 1 as it's exit code if any warnings are given.

That seems like desirable behavior in general.

A number of the warnings I put in there intentionally, because they were redundant (to give some easy fixes during refactoring).

I suppose I could fix all of those problems. It would still all be in one giant nasty function.

Anyway, I will wait until I hear back from one of the core maintainers, before I decide on my next course of action here ;)

@guygastineau
Copy link
Contributor Author

So I just discovered that I can disable Hlint for that single file.

If you all like the grotesqueness of the file that will be sent for refactoring, then I suppose that is one way to solve the issue.

Copy link
Contributor

@sshine sshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job.

I thought of another solution that builds a Data.Map and then constructs the tree from that.

I think the stub that needs to be refactored looks fine. I don't have any ideas at this moment as to which kind of pitfalls to place in it beyond ones you've already placed. We can always reiterate on it later, if we like.


data Record = Record Id (Maybe Id) deriving (Eq, Show)

data Tree = Leaf Id | Branch Id Children deriving (Eq, Show)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with Data.Tree from containers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I'll look into that.

cycles = any checkRecord
where checkRecord r =
case r of
(Record i Nothing) -> if' (i /= 0) True False
Copy link
Contributor

@sshine sshine Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Record i Nothing) -> if' (i /= 0) True False
Record i Nothing -> i /= 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that should be pretty obvious 😂

where checkRecord r =
case r of
(Record i Nothing) -> if' (i /= 0) True False
(Record i (Just p)) -> if' (p >= i) True False
Copy link
Contributor

@sshine sshine Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Record i (Just p)) -> if' (p >= i) True False
Record i (Just p) -> p >= i

. map (\xs -> (recordParent (head xs), map recordId xs))
. groupBy parentsEq
. sortOn recordParent
. sortOn recordId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will have the possibly unintuitive consequence that it'll primarily sort on recordParent and secondarily on recordId. An alternative is to sort with a combined function using the Monoid instance of Ordering described here:

sortBy $ \x y -> comparing recordParent x y `mappend` comparing recordId x y

or something similar...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

-- General Tools

if' :: Bool -> a -> a -> a
if' p a b = if p then a else b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer necessary.

build :: [ParentGrouping] -> Maybe Tree
build [] = Nothing
build (x:xs)
| not . validRoot $ x = Nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| not . validRoot $ x = Nothing
| not (validRoot x) = Nothing

in if not (null children)
then Branch yid (build' children)
else Leaf yid
) (snd y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not exhaustive.

I might write it like:

build' :: [ParentGrouping] -> [Tree]
build' [(_, yids)] = map Leaf yids
build' ((_, yids):pgs) = map buildWithChildren yids
  where
    buildWithChildren yid =
      let children = [ child | child <- pgs, fst child == Just yid ]
      in if null children
         then Leaf yid
         else Branch yid (build' children)

And I might address that it's not exhaustive by concluding that it shouldn't be empty.


groupByParent :: [Record] -> [ParentGrouping]
groupByParent = sortOn fst
. map (\xs -> (recordParent (head xs), map recordId xs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import Control.Arrow ((&&&))

(\x -> (f x, g x)) = f &&& g
Suggested change
. map (\xs -> (recordParent (head xs), map recordId xs))
. map (recordParent . head &&& map recordId))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I haven't used the Control.Arrow library before. It is exciting to see it in action here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only really use &&&, ***, first and second from Control.Arrow, and the only arrow I ever think about is (,).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may like to know that first and second actually live in two places nowadays: There is a better version in Data.Bifunctor. A bifunctor is considerably less scary than an arrow, I think.

-- verification
succIdCheck :: [Record] -> Bool
succIdCheck = all (\(x,y) -> succ x == y) . pairwise . sort . map recordId
where pairwise = zip <*> tail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this check is necessary.

Can't a database skip IDs as long as they're monotonically increasing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. Taking all of the tests from the golang version of this exercise has the concequence of those tests incorporated simply by virtue of their being there.

I have not had to deal with the premis of this exercise in the real world. I suppose if users delete comments, then we could definitely end up with trees that have gaps, although a lit of forums seem to keep a marker that it is deleted.

I suppose that is what we have to think about here.
is this tree building for a system where users have the right to completely delete their comments? or is it creepy like facebook.?


import Data.List
import Data.Ord (comparing)
--import Data.Maybe (fromMaybe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure an unused import is necessary in the stub.

Suggested change
--import Data.Maybe (fromMaybe)

@sshine
Copy link
Contributor

sshine commented Aug 28, 2019

I see that:

hlint ${TRAVIS_BUILD_DIR} # Run `hlint` on the entire repository.

returns 1 as it's exit code if any warnings are given. That seems like desirable behavior in general. A number of the warnings I put in there intentionally, because they were redundant (to give some easy fixes during refactoring).

You're quite right, with refactoring exercises we don't want hlint to fail in CI because of intentional mistakes. It should be possible to add a .hlint.yaml containing ignore directives:

- ignore: { within: [ TreeBuilding ] }

But I'm not sure how to distinguish the TreeBuilding of the stub with the TreeBuilding of the example solution. It's not possible to specify the directory in which to ignore a module, so perhaps we should look into replacing a repository-wide hlint . with local ones. I'm not really sure, though.

@guygastineau
Copy link
Contributor Author

It appears like we can put the ignore rules in the stub file as ANN pragmas for it to apply specifically to that file.
hlint manual
Unfortunately there are no anchors on the site for # linking, but the info is near the bottom.

The only issue I see is that is would introduce a lot of clutter in the file, and we would probably need to add nformation in the README.md telling the student to remove these pragmas as they fix the problems.

@guygastineau
Copy link
Contributor Author

I think I responded in some way to each thing 😆

I have to move a 1400 lb millingachine today, so I won't get any more work up until this evening in US/EST.

@sshine as always thank you very much for your constructive guidance 😁

@@ -0,0 +1,21 @@
name: tree-building
version: 1.0.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there is no canonical-data.json, by what's written in https://github.com/exercism/haskell/blob/master/README.md#exercise-versioning please use 0.1.0.s instead of 1.0.0.s version numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, cool. I wasn't sure of the protocol.

@@ -0,0 +1,21 @@
name: tree-building
version: 1.0.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there is no canonical-data.json, by what's written in https://github.com/exercism/haskell/blob/master/README.md#exercise-versioning please use 0.1.0.s instead of 1.0.0.s version numbers

@guygastineau
Copy link
Contributor Author

Just an update. I've been super busy recently, and I will be through Saturday.
Should get some time early next week to finish up on this project 😄

@sshine
Copy link
Contributor

sshine commented Aug 30, 2019

It appears like we can put the ignore rules in the stub file as ANN pragmas for it to apply specifically to that file.

I'm sorry that I didn't address this in my research: Yes, we could do that, but then we would distribute the pragma to the student, which would cause HLint to be disabled on their end, which we really don't want. So ideally we should find a workaround HLint not having an --ignore-file or --ignore-directory feature, which most likely will complicate the very neat hlint . at the root.

The long-term benefit we get from working out this part is that we make way for handing out stubs with warnings in them for other refactoring exercises. I think having obvious errors that HLint catches is a good basis for refactoring, since it encourages the use of HLint.

@guygastineau
Copy link
Contributor Author

Okay, those are some compelling reasons to get this figured out the right way regarding hlint 😁

Copy link
Contributor

@sshine sshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another implementation and some thoughts regarding the stub and test files:

  • How about naming the stub file ForumThread.hs?
  • Subsequently, module ForumThread where ....
  • How about renaming newTree into fromList?
  • How about renaming Tree into ForumThread?

Your tests include a lot of bad input cases also present in the F# version in which no trees are produced. One could imagine they're a product of some bad SQL JOIN (in case of duplicates), SELECT (in case of wrong order), DELETE (in case of missing parents) or some wicked UPDATE (in case of cycles).

We assume that records are sorted and hope that duplicates and cycles don't occur. But if these promises are broken, it's really a problem in the producer of these records. I thought it might be interesting to deal a little differently with the various errors:

  • "Duplicate", "Duplicate root": Instead of failing to produce a tree, what we'd rather do in a production environment is discard the duplicate and log a warning. This could be an excellent opportunity to try the validation package ("A data-type like Either but with an accumulating Applicative") or the co-log package (production-grade logging framework): Accept threads with duplicates by discarding them, and encourage in a later iteration to add either cumulative error handling or logging of these.
  • "Non-continuous": We assume that records are sorted, but test what happens if the assumption is broken? How about either (1) not assume this and require sorting, or (2) not test this and assume that the bug is located outside the function we're writing? In case of (1) it's just as easy to sort as it is to handle the error, and in case of (2) it's even easier.
  • "Cycle Directly", "Cycle indirectly": We can disregard cycles by dealing with records ordered by their ID: Record x (Just y) is always invalid if x > y (y is at this point an unknown parent). Similarly to duplicates, we could simply discard records that can't be inserted because of an unknown parent and encourage error handling / logging in a later iteration.
  • "Higher id parent of lower id": Same as "Non-continuous": How about either (1) not assume records are sorted and require sorting, in which case this is not an error, or (2) not test this and assume that the bug is located outside the function we're writing. Maybe a bug like this could occur if a record was "moved" from one thread to another. Maybe this is an acceptable anomaly as long as it doesn't produce bad trees.

TL;DR:

  • Proposal: Discard duplicates and pseudo-cyclical records, change tests so they're positive edge cases rather than negative cases.
  • Proposal: Either assume sorting and remove testing of non-sorted records, or require sorting and change negative tests of non-sorted records to positive tests.
module TreeBuilding (fromList, Record(..), Thread) where

import Debug.Trace (trace)

import Data.Function ((&))
import Data.Maybe (fromMaybe)
import Data.Ord (comparing)
import           Data.List (sortBy, foldl')

import qualified Data.Map as Map
import           Data.Map (Map)
import           Data.Tree (Tree(..))

type Id = Int
type Thread = Tree Id
data Record = Record
  { recordId       :: Id
  , recordParentId :: Maybe Id
  } deriving (Eq, Show)

type ParentId = Id
type ChildId = Id
type ParentChildMap = Map ParentId [ChildId]

fromList :: [Record] -> Maybe Thread
fromList = threadWithRoot 0
         . foldl' (flip insertRecord) Map.empty
         . sortBy (comparing recordParentId <> comparing recordId)
  where
    insertRecord (Record recordId Nothing) m
      | Map.member recordId m = trace ("Duplicate parent " ++ show recordId ++ "!") m
      | otherwise = Map.insertWith (++) recordId [] m

    insertRecord (Record recordId (Just parentId)) m
      | Map.member recordId m = trace ("Duplicate child " ++ show recordId ++ "!") m
      | Map.member parentId m =
          Map.insertWith (++) recordId []
            (Map.adjust (++ [recordId]) parentId m)
      | otherwise = trace ("Unknown parent " ++ show parentId ++ "!") m

threadWithRoot :: ParentId -> ParentChildMap -> Maybe Thread
threadWithRoot rootId parentChildMap = buildMaybe rootId
  where
    buildMaybe parentId = parentChildMap
                        & Map.lookup parentId
                        & fmap (Node parentId . map build)

    build parentId = fromMaybe (pure parentId) (buildMaybe parentId)


groupByParent :: [Record] -> [ParentGrouping]
groupByParent = sortOn fst
. map (\xs -> (recordParent (head xs), map recordId xs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only really use &&&, ***, first and second from Control.Arrow, and the only arrow I ever think about is (,).

@guygastineau
Copy link
Contributor Author

I'll finally have some time for this tonight !

I looked at Data.Tree and it is exactly what I need.

@sshine
Copy link
Contributor

sshine commented Sep 3, 2019

A last thought -- sorry for spamming you -- if we're giving a

data Thread = Leaf Id | Branch Id Children

in the stub, but we'd like people to use

import Data.Tree (Tree)
type Thread = Tree Id

then how can we express test cases abstractly? We would need to hide the concrete constructors (Leaf, Branch for the stub, Node for the example) and instead export functions for extracting the posts of a thread an abstract constructor, i.e.

module ForumThread (Record, Thread, fromList, thread) where

...

-- In the stub
thread :: Id -> [Thread] -> Thread
thread recordId [] = Leaf recordId
thread recordId children = Branch recordId children

-- In the example
thread :: Id -> [Thread] -> Thread
thread = Node

Then we can express expected trees in the test file using e.g.

Just (thread 0 [ thread 1 [], thread 2 [] ])

regardless of internal representation.

Then it can become an objective to change the internal representation to Data.Tree without affecting the test suite.

@guygastineau
Copy link
Contributor Author

@sshine that is amazing!

Thank you for sharing that strategy with me 😍

@@ -0,0 +1,21 @@
name: tree-building
version: 1.0.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upon further review of other example package.yaml, I find that we just don't have version in example package.yaml, likely so we don't need to update them in multiple places. So this one should be removed

exposed-modules: TreeBuilding
source-dirs: src
ghc-options: -Wall
# dependencies:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines can (and should) be removed from example's package.yaml

@sshine
Copy link
Contributor

sshine commented Sep 5, 2019

@guygastineau: I feel that I've left you a bit of a mess wrt. feedback.

I'll try and summarize my proposal of what we need to do from here:

  • Rename the stub module into ForumThread
  • Rename the main type the module exports into ForumThread
  • Rename the main function that the module exports into fromList
  • Create and use an abstract thread constructor in the stub, test and example, so students can refactor data ForumThread = ... into (new)type ForumThread = Tree Id without having to change the tests.
  • Fix imports and exports in the stub, test and example to something like Record(..), Thread, thread, fromList.
  • Make a decision about some of the negative cases, as I've outlined above.

In this gist I've updated the stub file and the tests so they conform to these proposals, and the only thing I've left commented out are the negative cases.

As for your example solution: Feel free to stop refactoring whenever you feel like it. The important part here is that it's a learning experience for you and that it behaves the same as the stub file. We can always take iterations on improving it further for your sake after merging the important parts.

@guygastineau
Copy link
Contributor Author

@sshine thank you for your detailed work.

I will see how far I can take this with your guidance ;)

I'm sorry I've left this PR up without progress for so long. A new project for a startup is taking a lot of time from my days.

@guygastineau
Copy link
Contributor Author

Hey @sshine That is a really good Gist! I appreciate all of your guidance here ;)

I am still swamped with real work projects for a start up right now. I could probably implement these changes over the next day or so, but if you are willing simply to change this to the better version I would be grateful.

I am sorry that I have left this PR hanging for so long on your track.

@sshine
Copy link
Contributor

sshine commented Sep 13, 2019

@guygastineau: We've had PRs open for a year, it really is no inconvenience. I'll look at it if you prefer to not continue with it. Thanks for all the effort so far!

@guygastineau
Copy link
Contributor Author

@sshine your generosity of spirit is only matched by your expertise.

Though it might take me til Monday to have it in good shape I suppose I should do it (it will help help me be a better Haskeller). Thank you for dissuading my anxieties about longer running PR's.

@guygastineau
Copy link
Contributor Author

Thank you for the review @petertseng

I can try to do some work on this this week, but I am still utterly swamped.

I have taken a huge interest in Scheme, and I am trying to help revitalize that6 very ignored track with another enthusiast right now ;)

I have done most of the work for the bob-andfriends` exercise idea, so I expect to get that up here soon.

Even though I said I should finish this PR for my benefit, I honestly would be grateful to anyone who wanted to finish this tree building port up.

@sshine
Copy link
Contributor

sshine commented Sep 26, 2019

would be grateful to anyone who wanted to finish this tree building port

You already did the heavy lifting.

I'm playing with stateful tree generators for this one, so it may take a couple of days.

@guygastineau
Copy link
Contributor Author

guygastineau commented Sep 26, 2019

@sshine cool. I am excited to see your work. I imagine I will learn a lot from reading it 🤓

In the mean time I will get the bob-and-friends PR up.

@sshine
Copy link
Contributor

sshine commented Sep 26, 2019

I've gotten seriously frustrated at QuickCheck-GenT. It's old, so it seems you need to run the stateful generator using something like

instance Arbitrary Thread where
  arbitrary = (`evalState` state) <$> runGenT (sized threadGen)

I ditched the thing when I couldn't easily call on deriving (..., MonadState ..., MonadGen ...) having to explicitly lift non-GenT features from QuickCheck's Gen like shuffle. Meh. So I'm redoing it with Hedgehog.Gen that is naturally implemented mtl-style. Hedgehog deals with generating recursive structures in a less primitive way, so I want to learn to do that part right.

@guygastineau
Copy link
Contributor Author

Well, this sure has been open for along time ;)

I have learned a lot since I tried porting this exercise. I think I will give it another go now.

Base automatically changed from master to main January 28, 2021 19:15
@sshine
Copy link
Contributor

sshine commented Feb 10, 2021

@guygastineau: Cool :)

I think I learned something about how to not give feedback doing this one, haha. I think if you regard all of the comments above as one big brainstorm and do the exercise without constraints, maybe it will be less critical. Feel free to decide whether you want to close this one and open a new one, or just work from here and not let the comments made so far disturb your creative process too much.

@sshine
Copy link
Contributor

sshine commented Feb 10, 2021

@guygastineau: Upon revisiting the comment thread in this PR, I see this comment containing a link to a proposed copy of the handed-out solution and a test suite that is compatible with it. You would only need to make an example solution that you like and conforms to this test suite (which was made flexible wrt. internal representation, so that this could become a part of the task).

@guygastineau
Copy link
Contributor Author

@sshine hahaha. Well, I think your feedback was useful even though some of it lost me at the time ☺️

I got interested in this again, because I thought I might be able to use recursion schemes in the example. I'll see if it is practical with the internal representation. Still, my point is that your advice was good, but it was over my head at the time. The above brainstorming session makes much more sense now that I have gone on to learn and build more Haskell projects in the wild that do actual work. My ongoing quest to learn intuitionistic type theory is also working out (though it is hard), that and some explorations in compilers has made a lot of things clearer for me ☺️

I will need to see if I can get tools here to work for me. Last time (for some reason I can't understand) I did not reach out about the testing tools not working. Is stack still set up to use some sandbox directory to run all of the tests?

IDK, I will re-familiarize myself with the process of testing and contributing to this project. If I encounter similar issues using the repo's utilities I will reach out for help with that this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants